Thursday, 11 December 2014

What is a Cluster File System?

A clustered file system is a file system where the data is distributed on multiple nodes (machines) that appear to the clients as a single storage system (a cluster). There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for each node). Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

Distributed vs Clustered File System

Both the File Systems provide a unified view, global namespace, whatever you want to call it. The difference lies in the model used for the underlying block storage. In a cluster file system, all of the nodes connect to the same block storage, with access mediated by locks or other synchronization primitives. In a distributed file system, each server has its own private block storage, which is only unified at a higher level.

Cluster Filesystems have mostly fallen out of fashion, primarily because their storage model requires a relatively expensive external (e.g. FC/iSCSI) disk subsystem plus switches, adapters, etc. The up side is that this allows disk failures to be handled on the external subsystem, and the same-ness of the underlying storage can ease handling of server failures as well.

Distributed Filesystems, on the other hand, can be and usually are built using cheaper SATA/SAS disks through on-board controllers. (Note that they can be built on top of SANs, except in environments such as AWS where such things don't exist.) While such filesystems can easily beat their cluster cousins in terms of throughput per dollar, they often do so at the cost of worse latency and greater complexity to provide data availability across separate pools of storage.

Since the latency issues can be addressed with smarter caching/replication, which - along with the other kinds of complexity - is just a one-time development issue, I believe that distributed filesystems will eventually displace cluster filesystems entirely. Right now, though, there are use cases such as virtual-machine image storage or databases that are probably better served by Cluster Filesystems.


No comments:

Post a Comment