The “CAP” Theorem

  • To fulfill the Conditions for an operational distributed file system [see Storing data: Hadoop Distributed File System (DFS)], the CAP rule applies. “
  • C”, “A”, “P” stands for:
    • consistency (C): all the data nodes have the same copy of the data
    • availability (A): if a node fails, the others still function
    • partition tolerance (P): this means that the DFS continues to work if there is a network partition. “Network partition” means that there is a physical disruption of communication channels between datanodes
  • The rule of the CAP theorem is as follows:

“In a DFS system, only two of the criteria can be met at a given moment”

  • In essence this means:
    • C + A
    • C + P
    • A + P
  • Notes:
    • DFS is also partitioned in the NoSQL storage model: that means the CAP theorem also applies: that means availabilities can be sacrified to the benefit of consistency and vice-versa.
    • [A Relational Database management system (RDMS), is not partitioned and availability and consistency are guaranteed]
  • We are looking for a scalable and fast storage system compromise:
  • For a more efficient use of Big Data storage and to combine with efficience of a RDMS (Relational Database Management System) we are looking for “BASE”:
    • Basically Available
    • Soft [a compromise of partition tolerance (P) to have flexibility in the consistency requirement]
    • Eventually Consistent

Reference: Big Data: A very short introduction by Dawn E. Holmes. Oxford University Press, 2017


Scroll to Top