What Is the CAP Theorem?
- Consistency (C)
- Availability (A)
- Partition Tolerance (P)
{getToc} $title={Table of Contents} $count={true} $expanded={false}
flowchart TD
A[CAP Theorem]
B[Consistency]
C[Availability]
D[Partition Tolerance]
A --> B
A --> C
A --> D
Consistency, Availability, Partition Tolerance
Consistency
- Definition: Every read sees the most recent write.
- In practice: You update your profile picture and, immediately afterward, any call to
getProfilePicture()
returns that new image every time, everywhere.
Availability
- Definition: Every request (read or write) receives a non error response not necessarily the latest data, but it gets something back.
- In practice: Even if a database node is busy or slow, your query times out gracefully or returns a slightly old value.
Partition Tolerance
- Definition: The system continues to operate despite arbitrary network “partitions” (i.e., failures, drops, or delays between nodes).
- In practice: If your servers in London can’t talk to your servers in Tokyo for a few seconds, each side still serves requests rather than crashing completely.
Why “Partition Tolerance” Is Non Negotiable
Networks fail, Routers crash, cables cut, clouds (literally) get in the way. Any truly distributed system must tolerate those failures, or it will simply stop working. So in practice P is a given; you decide whether to sacrifice C or A when partitions happen.
CA, CP, and AP – Trade-Offs Demystified
- CA: Consistency + Availability
- What it looks like: Every node always agrees, and every request returns a response.
- The catch: If the network splits, you can’t guarantee both. To stay consistent, some requests must fail, so you lose availability.
- Example: A single server database (not truly distributed).
- Use when: You control a single data center or can tolerate downtime under partition.
- CP: Consistency + Partition Tolerance
- What it looks like: During a network split, you’ll refuse requests on some nodes rather than risk returning stale data.
- The catch: Availability takes a hit when partitions happen.
- Example: Banking systems better to show an error than the wrong balance.
- Use when: Strong data correctness is critical, and you can accept occasional errors.
- AP: Availability + Partition Tolerance
- What it looks like: The system always replies, even under partition, but you might get slightly out of date data.
- The catch: You sacrifice strong consistency.
- Example: Social feeds showing a “like” count that’s a few seconds old is OK.
- Use when: User experience matters more than absolute currency of data.
Real-World Analogies
The Bank Branch Analogy
sequenceDiagram
participant A as ATM NY
participant B as ATM SF
participant DB as Central Database
A-->>DB: Withdraw $100
DB-->>A: Confirm balance
Note right of B: Network split:
SF can’t reach NY B-->>B: Allow local withdrawal
SF can’t reach NY B-->>B: Allow local withdrawal
Explanation:
CA (Consistency + Availability, no Partition tolerance)- Both ATMs always talk to the central database to get the up to date balance (consistency) and they’re always up as long as the network is fine (availability).
- But the moment the network link between ATM and central DB goes down (a “partition”), they can’t both have consistent, available service so they simply go offline (no withdrawals).
- Here you insist on never returning stale or divergent balances, even when the network is split.
- That means on a partition you refuse to process withdrawals (you’d rather deny service than risk an inconsistent balance).
- You keep the ATMs up and let people withdraw (availability), even if the central DB can’t be reached (partition).
- You accept that, until the machines sync back up, different ATMs might show slightly different balances (eventual consistency).
The Voting Booth Analogy
Imagine an election app used by multiple districts:- Consistency: Every district must see the same tally before publishing (slower, risk offline).
- Availability: Each district publishes local results immediately (may differ until final merge).
- Partition Tolerance: If districts go offline, each still logs votes.
When Each Pair Makes Sense
CA: Small clusters, low tolerance for stale reads, and high tolerance for downtime.
CP: When correctness is paramount (financial ledgers, inventory). You’d rather fail than show wrong stock counts.
AP: When responsiveness is paramount (social feeds, IoT sensors). Stale is okay; downtime is not.
Real-World Database Examples
Guarantee | Example Systems | Typical Use Case |
---|---|---|
CA | MySQL (single-node), PostgreSQL | Small deployments, monoliths |
CP | HBase, MongoDB (with majority write concern) | Financial transactions, seat booking |
AP | DynamoDB, CouchDB | Social media, IoT telemetry |
Choosing Your Two
flowchart LR
A[Start: Build a Distributed System]
A --> B{Is network failure possible?}
B -- No --> C[Pick CA: Consistency + Availability]
B -- Yes --> D{What matters more under partition?}
D -- Consistency --> E[Pick CP]
D -- Availability --> F[Pick AP]
Tags:
Technology