- To fulfill the Conditions for an operational distributed file system [see Storing data: Hadoop Distributed File System (DFS)], the CAP rule applies. “
- C”, “A”, “P” stands for:
- consistency (C): all the data nodes have the same copy of the data
- availability (A): if a node fails, the others still function
- partition tolerance (P): this means that the DFS continues to work if there is a network partition. “Network partition” means that there is a physical disruption of communication channels between datanodes
- The rule of the CAP theorem is as follows:
“In a DFS system, only two of the criteria can be met at a given moment”
- In essence this means:
- Notes:
- DFS is also partitioned in the NoSQL storage model: that means the CAP theorem also applies: that means availabilities can be sacrified to the benefit of consistency and vice-versa.
- [A Relational Database management system (RDMS), is not partitioned and availability and consistency are guaranteed]
- We are looking for a scalable and fast storage system compromise:
- For a more efficient use of Big Data storage and to combine with efficience of a RDMS (Relational Database Management System) we are looking for “BASE”:
- Basically Available
- Soft [a compromise of partition tolerance (P) to have flexibility in the consistency requirement]
- Eventually Consistent
Reference: Big Data: A very short introduction by Dawn E. Holmes. Oxford University Press, 2017