What do you mean by Distributed Systems, Shared Memory and File Systems

Distributed Shared Memory

The distributed shared memory (DSM) implements the shared memory model in distributed systems, which have no physical shared memory. The shared memory model provides a virtual address space shared between all nodes

Data moves between main memory and secondary memory (within a node) and between main memories of different nodes. Each data object is owned by a node

Initial owner is the node that created object
Ownership can change as object moves from node to node

When a process accesses data in the shared address space, the mapping manager maps shared memory address to physical memory.

Issues for the DSM design?

Data sharing is implicit, hiding data movement.
Passing data structures containing pointers is easier.

Credits: https://programmerprodigy.code.blog/wp-content/uploads/2021/07/88ca7-distributed2bshared2bmemory2b2.png

Coherence Protocols?

DSM Systems make use of data replication, where copies of data are maintained at all the nodes accessing the data. A fundamental problem with data replication is the difficulty in ensuring that all copies have the same information and that nodes do not access stale data.

A protocol to keep replicas coherent is needed. Two basic protocols to maintain coherence are the write invalidate protocol and the write update protocol

1.Write-invalidate protocol

A write to shared data invalidates all copies except one before write executes
Invalidated copies are no longer accessible

Advantage:

Good performance for many updates between reads Per node locality of reference

Disadvantage:

In validations sent to all nodes that have copies
Inefficient if many nodes access same object

2. Write-update protocol

A write to shared data causes all copies to be updated (new value sent, instead of validation)
More difficult to implement

Granularity is the size of shared memory unit. If DSM page size is a multiple of the local virtual memory (VM) management page size, then DSM can be integrated with VM, i.e. use the VM page handling.

Distributed File Systems

Implement a common file system that can be shared by all autonomous computers in a distributed system. Goals of distributed file system,

Network transparency: Users need not be aware of the physical location of the file to access them.
High availability: Users should have easy access to the file irrespective of physical location.

Architecture of a DFS?

Files can be stored at any machine and computation can be performed at any machine.
When a machine needs to access a file stored on a remote machine, the remote machine performs the necessary file access operations and returns data if read operation is performed.
File servers : dedicated to storing files and performing storage and retrieval operations.
Clients : Rest machines in the system can be used for computational purposes and can access the files on stored on servers.
Some client machine’s may also be equipped with a local disk storage that can be used for caching remote files, as a swap area or as a storage area.

Credits: https://docs.microsoft.com/pt-br/windows-server/storage/dfs-namespaces/media/dfs-overview.png

Services provided by the distributed file system?

Name Server: Provides mapping (name resolution) the names supplied by clients into objects (files and directories)
1. Takes place when process attempts to access file or directory the first time
Cache Manager: Improves performance through file caching
1. Caching at the client: When client references file at server
  1. Copy of data brought from server to client machine
  2. Subsequent accesses done locally at the client
2. Caching at the Server:
  1. File Saved in memory to reduce subsequent access time

Different cached copies can become inconsistent. Cache managers have to provide coordination.

Mechanisms used in distributed file systems?

1.Mounting:

The mount mechanism binds together several filename spaces into a single hierarchically structured name space. Kernel maintains the mount table, mapping mount points to storage devices. Location of mount information,

Mount information maintained at clients
1. Each client mounts every file system
2. Different clients may not see the same file name space
3. If files move to another server, very client need to update its mount table.
Mount information maintained at server:
1. Every client see the same filename space
2. If files move to another server, mount info at server only needs t change

2.Caching:

Improves file system performance by exploiting the locality of reference
When client references a remote file, the file is cached in the main memory of the server (server cache) and at the client (client cache)
When multiple clients modify shared (cached) data, cache consistency becomes a problem
It is very difficult to implement a solution that guarantees consistency

3.Hints:

Treat the cached data as hints, i.e. cached data may not be completely accurate
Can be used by applications that can discover that the cached data is invalid and can recover

Example for Hints

After the name of a file is mapped to an address, that address is stored as a hint in the cache
If the address later fails, it is purged from the cache
The name server is consulted to provide the actual location of the file and the cache is updated

4.Bulk data transfer:

Observations:
1. Overhead introduced by protocols does not depend on the amount of data transferred in one transaction
2. Most files are accessed in their entirety
Common practice: when client requests one block of data, multiple consecutive blocks are transferred

5.Encryption:

Encryption is needed to provide security in distributed systems
Entities that need to communicate send request to authentication server
Authentication server provides key for conversation

Design Issues in DFS?

1. Naming and name resolution :

Solve the problem of system-wide unique names, by partitioning a name space into contexts (geographical, organizational, etc.)
Name resolution is done within that context
Interpretation may lead to another context

File Name = Context + Name local to context

2.Name server :

Process that maps file names to objects (files, directories). Implementation options are,

Single name Server
- Simple implementation, reliability and performance issues
Several Name Servers (on different hosts)
- Each server responsible for a domain

For example, Client requests access to file ‘A/B/C’. Local name server looks up a table (in kernel) Local name server points to a remote server for ‘/B/C’ mapping

3.Caching:

Caching at the client: Main memory vs. Disk
Cache consistency
- Server initiated
  - Server informs cache managers when data in client caches is stale
  - Client cache managers invalidate stale data or retrieve new data
  - Disadvantage: extensive communication
- Client initiated
  - Cache managers at the clients validate data with server before returning it to clients
  - Disadvantage: extensive communication
- Prohibit file caching when concurrent-writing
  - Several clients open a file, at least one of them for writing
  - Server informs all clients to purge that cached file
- Lock files when concurrent-write sharing (at least one client opens for write)

4.Writing policy:

Once a client writes into a file (and the local cache), when should the modified cache be sent to the server?

Options:
- Write-through: all writes at the clients, immediately transferred to the servers
  - Advantage: reliability
  - Disadvantage: performance, it does not take advantage of the cache
- Delayed writing: delay transfer to servers
  - Advantages:
    - Many writes take place (including intermediate results) before a transfer
    - Some data may be deleted
  - Disadvantage: reliability
- Delayed writing until file is closed at client
  - For short open intervals, same as delayed writing
  - For long intervals, reliability problems

5.Availability:

We use replication to increase availability, i.e. replicas of files are maintained at different sites/servers

Replication issues:
- How to keep replicas consistent
- How to detect inconsistency among replicas
Unit of replication
- File
- Group of files

6. Scalability:

Server-initiated cache invalidation complexity and load grow with size of system.
- Do not provide cache invalidation service for read-only files
- Provide design to allow users to share cached data
Design file servers for scalability: threads, SMPs, clusters

7. Semantics:

A read will return data stored by the latest write

Possible options:
- All read and writes go through the server
  - Disadvantage: communication overhead
- Use of lock mechanism
  - Disadvantage: file not always available

Stateful Protocols?

Stateful means that there is memory of the past. Previous transactions are remembered and may affect the current transaction.

In a stateful protocol, if a client sends a request to the server, it then expects a response of some kind. Should it fail to receive any response, it will then resend the request.

Examples of stateful protocols are FTP and Telnet.

// The state is maintained by the function
private int _number = 0; //initially zero

function int addOne()
{
   _number++;
   return _number;
}

Stateful protocol features:

Stateful protocols frequently provide comparatively better performances to the client. This is possible by way of monitoring the connection information.
The stateful applications typically require backing storage.
Stateful requests are heavily dependent on the server-side state.
TCP sessions will often follow stateful protocol due to both systems maintaining information concerning the session itself throughout its life.

Some environments require stateful service:

A server employing server initiated cache validation cannot provide stateless service, since it maintains a record of which files are cached bby which clients
UNIX uses a file descriptors and implicit offsets is inherently stateful. Server must maintain tables to map the file decriptors to inodes, and store the current offset within a file.

Credits: https://crmtrilogix.com/images/uploaded/stateful.png

Stateless Protocols?

Stateless means there is no memory of the past. Every transaction is performed as if it were being done for the very first time.

Stateless protocols are the type of network protocols where clients send requests to the server. The server will then respond back in accordance with the current state. There’s no requirement for the server to preserve session information or status concerning each communicating partner for multiple requests.

Examples of Stateless protocols are HTTP, UDP, and DNS are all examples of a stateless protocol.

// The state is derived by what is passed into the function

function int addOne(int number)
{
    return number + 1;
}

Stateless protocol features:

Stateless protocols simplify the overall design of the server.
The stateless protocol requires a lesser amount of resources. This is because the system does not need to monitor the multiple link communications and details of the session.
In a stateless protocol, each packet of information travels on its own without a reference to any other packet.
Each communication within a stateless protocol is discrete and has no relation to those that precedes or follows it.

Disadvantages for using the robust stateless service:

Longer request messages
Slower request processing
Difficulty in providing UNIX file semantics

For more information,

For more information on the following topics,

How to synchronize distributed systems: https://programmerprodigy.code.blog/2021/07/07/how-to-synchronize-distributed-systems/
How to detect a Deadlock and resolve it in Distributed Systems: https://programmerprodigy.code.blog/2021/07/07/how-to-detect-a-deadlock-and-resolve-it-in-distributed-systems/
Fault Tolerance and Recovery in Distributed systems: https://programmerprodigy.code.blog/2021/07/07/fault-tolerance-and-recovery-in-distributed-systems/

Pages: 1 2

What do you mean by Distributed Systems, Shared Memory and File Systems

Distributed Shared Memory

Coherence Protocols?

Distributed File Systems

Architecture of a DFS?

Services provided by the distributed file system?

Mechanisms used in distributed file systems?

Design Issues in DFS?

Stateful Protocols?

Stateless Protocols?

3 thoughts on “What do you mean by Distributed Systems, Shared Memory and File Systems”

Leave a comment Cancel reply

Published by Hridyesh singh bisht

Distributed Shared Memory

Coherence Protocols?

Distributed File Systems

Architecture of a DFS?

Services provided by the distributed file system?

Mechanisms used in distributed file systems?

Design Issues in DFS?

Stateful Protocols?

Stateless Protocols?

Share this:

Related

3 thoughts on “What do you mean by Distributed Systems, Shared Memory and File Systems”

Leave a comment Cancel reply

Published by Hridyesh singh bisht