Distributed Shared Memory
The distributed shared memory (DSM) implements the shared memory model in distributed systems, which have no physical shared memory. The shared memory model provides a virtual address space shared between all nodes
Data moves between main memory and secondary memory (within a node) and between main memories of different nodes. Each data object is owned by a node
- Initial owner is the node that created object
- Ownership can change as object moves from node to node
When a process accesses data in the shared address space, the mapping manager maps shared memory address to physical memory.
Issues for the DSM design?
- Data sharing is implicit, hiding data movement.
- Passing data structures containing pointers is easier.
Coherence Protocols?
DSM Systems make use of data replication, where copies of data are maintained at all the nodes accessing the data. A fundamental problem with data replication is the difficulty in ensuring that all copies have the same information and that nodes do not access stale data.
A protocol to keep replicas coherent is needed. Two basic protocols to maintain coherence are the write invalidate protocol and the write update protocol
1.Write-invalidate protocol
- A write to shared data invalidates all copies except one before write executes
- Invalidated copies are no longer accessible
Advantage:
- Good performance for many updates between reads Per node locality of reference
Disadvantage:
- In validations sent to all nodes that have copies
- Inefficient if many nodes access same object
2. Write-update protocol
- A write to shared data causes all copies to be updated (new value sent, instead of validation)
- More difficult to implement
Granularity is the size of shared memory unit. If DSM page size is a multiple of the local virtual memory (VM) management page size, then DSM can be integrated with VM, i.e. use the VM page handling.
Distributed File Systems
Implement a common file system that can be shared by all autonomous computers in a distributed system. Goals of distributed file system,
- Network transparency: Users need not be aware of the physical location of the file to access them.
- High availability: Users should have easy access to the file irrespective of physical location.
Architecture of a DFS?
- Files can be stored at any machine and computation can be performed at any machine.
- When a machine needs to access a file stored on a remote machine, the remote machine performs the necessary file access operations and returns data if read operation is performed.
- File servers : dedicated to storing files and performing storage and retrieval operations.
- Clients : Rest machines in the system can be used for computational purposes and can access the files on stored on servers.
- Some client machine’s may also be equipped with a local disk storage that can be used for caching remote files, as a swap area or as a storage area.

Services provided by the distributed file system?
- Name Server: Provides mapping (name resolution) the names supplied by clients into objects (files and directories)
- Takes place when process attempts to access file or directory the first time
- Cache Manager: Improves performance through file caching
- Caching at the client: When client references file at server
- Copy of data brought from server to client machine
- Subsequent accesses done locally at the client
- Caching at the Server:
- File Saved in memory to reduce subsequent access time
- Caching at the client: When client references file at server
Different cached copies can become inconsistent. Cache managers have to provide coordination.
Mechanisms used in distributed file systems?
1.Mounting:
The mount mechanism binds together several filename spaces into a single hierarchically structured name space. Kernel maintains the mount table, mapping mount points to storage devices. Location of mount information,
- Mount information maintained at clients
- Each client mounts every file system
- Different clients may not see the same file name space
- If files move to another server, very client need to update its mount table.
- Mount information maintained at server:
- Every client see the same filename space
- If files move to another server, mount info at server only needs t change
2.Caching:
- Improves file system performance by exploiting the locality of reference
- When client references a remote file, the file is cached in the main memory of the server (server cache) and at the client (client cache)
- When multiple clients modify shared (cached) data, cache consistency becomes a problem
- It is very difficult to implement a solution that guarantees consistency
3.Hints:
- Treat the cached data as hints, i.e. cached data may not be completely accurate
- Can be used by applications that can discover that the cached data is invalid and can recover
Example for Hints
- After the name of a file is mapped to an address, that address is stored as a hint in the cache
- If the address later fails, it is purged from the cache
- The name server is consulted to provide the actual location of the file and the cache is updated
4.Bulk data transfer:
- Observations:
- Overhead introduced by protocols does not depend on the amount of data transferred in one transaction
- Most files are accessed in their entirety
- Common practice: when client requests one block of data, multiple consecutive blocks are transferred
5.Encryption:
- Encryption is needed to provide security in distributed systems
- Entities that need to communicate send request to authentication server
- Authentication server provides key for conversation
Design Issues in DFS?
1. Naming and name resolution :
- Solve the problem of system-wide unique names, by partitioning a name space into contexts (geographical, organizational, etc.)
- Name resolution is done within that context
- Interpretation may lead to another context
File Name = Context + Name local to context
2.Name server :
Process that maps file names to objects (files, directories). Implementation options are,
- Single name Server
- Simple implementation, reliability and performance issues
- Several Name Servers (on different hosts)
- Each server responsible for a domain
For example, Client requests access to file ‘A/B/C’. Local name server looks up a table (in kernel) Local name server points to a remote server for ‘/B/C’ mapping
3.Caching:
- Caching at the client: Main memory vs. Disk
- Cache consistency
- Server initiated
- Server informs cache managers when data in client caches is stale
- Client cache managers invalidate stale data or retrieve new data
- Disadvantage: extensive communication
- Client initiated
- Cache managers at the clients validate data with server before returning it to clients
- Disadvantage: extensive communication
- Prohibit file caching when concurrent-writing
- Several clients open a file, at least one of them for writing
- Server informs all clients to purge that cached file
- Lock files when concurrent-write sharing (at least one client opens for write)
- Server initiated
4.Writing policy:
Once a client writes into a file (and the local cache), when should the modified cache be sent to the server?
- Options:
- Write-through: all writes at the clients, immediately transferred to the servers
- Advantage: reliability
- Disadvantage: performance, it does not take advantage of the cache
- Delayed writing: delay transfer to servers
- Advantages:
- Many writes take place (including intermediate results) before a transfer
- Some data may be deleted
- Disadvantage: reliability
- Advantages:
- Delayed writing until file is closed at client
- For short open intervals, same as delayed writing
- For long intervals, reliability problems
- Write-through: all writes at the clients, immediately transferred to the servers
5.Availability:
We use replication to increase availability, i.e. replicas of files are maintained at different sites/servers
- Replication issues:
- How to keep replicas consistent
- How to detect inconsistency among replicas
- Unit of replication
- File
- Group of files
6. Scalability:
- Server-initiated cache invalidation complexity and load grow with size of system.
- Do not provide cache invalidation service for read-only files
- Provide design to allow users to share cached data
- Design file servers for scalability: threads, SMPs, clusters
7. Semantics:
A read will return data stored by the latest write
- Possible options:
- All read and writes go through the server
- Disadvantage: communication overhead
- Use of lock mechanism
- Disadvantage: file not always available
- All read and writes go through the server
Stateful Protocols?
Stateful means that there is memory of the past. Previous transactions are remembered and may affect the current transaction.
In a stateful protocol, if a client sends a request to the server, it then expects a response of some kind. Should it fail to receive any response, it will then resend the request.
Examples of stateful protocols are FTP and Telnet.
// The state is maintained by the function
private int _number = 0; //initially zero
function int addOne()
{
_number++;
return _number;
}
Stateful protocol features:
- Stateful protocols frequently provide comparatively better performances to the client. This is possible by way of monitoring the connection information.
- The stateful applications typically require backing storage.
- Stateful requests are heavily dependent on the server-side state.
- TCP sessions will often follow stateful protocol due to both systems maintaining information concerning the session itself throughout its life.
Some environments require stateful service:
- A server employing server initiated cache validation cannot provide stateless service, since it maintains a record of which files are cached bby which clients
- UNIX uses a file descriptors and implicit offsets is inherently stateful. Server must maintain tables to map the file decriptors to inodes, and store the current offset within a file.

Stateless Protocols?
Stateless means there is no memory of the past. Every transaction is performed as if it were being done for the very first time.
Stateless protocols are the type of network protocols where clients send requests to the server. The server will then respond back in accordance with the current state. There’s no requirement for the server to preserve session information or status concerning each communicating partner for multiple requests.
Examples of Stateless protocols are HTTP, UDP, and DNS are all examples of a stateless protocol.
// The state is derived by what is passed into the function
function int addOne(int number)
{
return number + 1;
}
Stateless protocol features:
- Stateless protocols simplify the overall design of the server.
- The stateless protocol requires a lesser amount of resources. This is because the system does not need to monitor the multiple link communications and details of the session.
- In a stateless protocol, each packet of information travels on its own without a reference to any other packet.
- Each communication within a stateless protocol is discrete and has no relation to those that precedes or follows it.
Disadvantages for using the robust stateless service:
- Longer request messages
- Slower request processing
- Difficulty in providing UNIX file semantics
For more information,
- http://cse.csusb.edu/tongyu/courses/cs461/notes/index.php
- https://courses.cs.washington.edu/courses/cse452/
- https://www.cl.cam.ac.uk/teaching/2021/ConcDisSys/dist-sys-notes.pdf
For more information on the following topics,
- How to synchronize distributed systems: https://programmerprodigy.code.blog/2021/07/07/how-to-synchronize-distributed-systems/
- How to detect a Deadlock and resolve it in Distributed Systems: https://programmerprodigy.code.blog/2021/07/07/how-to-detect-a-deadlock-and-resolve-it-in-distributed-systems/
- Fault Tolerance and Recovery in Distributed systems: https://programmerprodigy.code.blog/2021/07/07/fault-tolerance-and-recovery-in-distributed-systems/

3 thoughts on “What do you mean by Distributed Systems, Shared Memory and File Systems”