Most candidates over-prepare for system design. They buy Alex Xu's book, grind a 40-video YouTube playlist, read 8 blog posts from Meta engineers on the Saga pattern, and walk into their interview bloated with trivia they can't recall under pressure.
The truth: about 14 concepts cover roughly 80% of what you'll be asked in a mid-senior-level system design round at Google, Meta, Amazon, Stripe, Airbnb, or any equivalent. Everything else is either (a) a recombination of these, or (b) so company-specific that you'd only encounter it if interviewing for that exact team.
This guide is the minimum viable reference — built around patterns interviewers actually score on, with trade-offs they actually probe, and a one-page cheat sheet at the bottom. Already comfortable with system design and want to fill in the coding round? Skim our 15 LeetCode patterns guide next.
The Interview Structure You Need to Memorize
Every system-design round — regardless of company — follows roughly the same rhythm. Your interviewer is mentally checking boxes in this order:
- Requirements clarification (5 min) — functional, non-functional, out-of-scope
- Capacity estimation (3–5 min) — QPS, storage, bandwidth, cache size
- High-level API / data model (5 min)
- High-level architecture (5–10 min) — components and their boundaries
- Deep dive on 1–2 components (10–15 min) — where most of the signal lives
- Bottleneck / scale / trade-off discussion (5–10 min) — the grade-A signal
If you burn 20 minutes on requirements and rush the deep dive, you fail even if you "knew" the answer. Pacing is a skill. Practice it with a stopwatch.
The 14 Core Concepts
In rough order of how often they come up:
1. Load Balancing (always)
Every non-trivial system needs it. You should be able to instantly describe:
- L4 (TCP) vs L7 (HTTP) — L7 gives you request-aware routing (e.g., by URL path or cookie); L4 is faster and simpler.
- Round-robin vs least-connections vs consistent-hash — consistent hash is your answer whenever sticky routing matters (cache affinity, session continuity).
- Client-side load balancers (e.g., gRPC) skip an extra hop; good for internal microservices.
2. Caching (always)
Interviewers will push you here. Know:
- Cache placement: client → CDN → reverse proxy → application memory → distributed (Redis/Memcached) → database
- Eviction policies: LRU (default), LFU (when access patterns are skewed), TTL-based (when freshness matters)
- Invalidation strategies: write-through, write-back, write-around, TTL. If asked "what's the hardest problem in CS?" — cache invalidation is one of the two correct answers.
- Cache stampede — when the cached key expires and 10,000 requests hit the DB simultaneously. Mitigate with request coalescing or staggered TTLs.
3. Database Choice: SQL vs NoSQL
The answer is always "it depends" but interviewers want to hear the right factors:
| Use SQL when… | Use NoSQL when… |
|---|---|
| You need multi-row transactions | You have massive write throughput |
| Your schema is stable and relational | Your data is denormalized or document-shaped |
| You need complex JOINs and ad-hoc queries | Access patterns are known and narrow |
| Strong consistency is required | Eventual consistency is acceptable |
| Data volume < 10TB (ballpark) | Need to scale horizontally by design |
Default answer for most systems: Postgres (or MySQL) unless you have a specific reason to deviate. "Always NoSQL because Google uses it" is a fail signal.
4. Sharding / Partitioning
Three strategies, in order of when to use:
- Range-based: simple, but hot spots are deadly (e.g., sharding by user signup date = all new users hit the same shard)
- Hash-based: good distribution, but range queries become expensive or impossible
- Consistent hashing: what you want when nodes join/leave. Always mention virtual nodes to smooth the distribution.
5. Replication & Consistency
- Leader-follower (primary-replica): writes go to the leader, reads can be spread across followers. Read-your-own-writes consistency requires routing to the leader (or a sticky session).
- Multi-leader: conflict resolution is the hard part (last-write-wins loses data; CRDTs preserve it).
- Synchronous vs async replication: sync = no data loss but higher write latency; async = fast writes but possible data loss on failover.
6. CAP Theorem (and Why Interviewers Get It Slightly Wrong)
CAP says: under a network partition, you pick Consistency OR Availability. You can't have all three in a distributed system when there's a partition. When the network is healthy, you can have both.
- CP systems: Postgres, HBase, MongoDB (with majority write concern). Reject writes to avoid split-brain.
- AP systems: DynamoDB, Cassandra, Riak. Accept writes on both sides of a partition; reconcile later.
The mature answer: "it depends on the specific operation" — some APIs on a single system can be CP, others AP.
7. Message Queues & Event-Driven Architecture
When you need decoupling, async processing, or ordered events. Know:
- Kafka: log-structured, pull-based consumers, retention for replay. Choose for event-sourcing and high-throughput.
- RabbitMQ / SQS: traditional queue semantics, push-based, good for task queues.
- Pub/Sub: one event, many consumers (fan-out).
- At-least-once vs exactly-once delivery — know that true exactly-once is only achievable with idempotency + dedup on the consumer.
8. CDNs
Mention them for anything user-facing. Static assets, images, video — all hit a CDN before your origin. Know cache busting (query string vs content-hashed filename), and that dynamic content can also be cached via signed URLs or at the edge.
9. Rate Limiting
Four algorithms in order of complexity/accuracy:
- Token bucket (default — handles bursts gracefully)
- Leaky bucket (smooths traffic to a constant rate)
- Fixed window (simple but has boundary problem — 2x requests at the minute boundary)
- Sliding window log (most accurate, highest memory)
Distributed rate limiting needs Redis + atomic operations (INCR + EXPIRE) or a specialized service.
10. Consistent Hashing
This is the single concept interviewers most love to quiz on. You should be able to draw the ring on a whiteboard in 60 seconds. Key properties:
- Adding/removing a node only affects 1/N of the keys (vs N/N for modulo hashing)
- Virtual nodes smooth out load imbalance when N is small
- Used in: DynamoDB, Cassandra, Memcached clients, CDN edge routing
11. Quorum-Based Systems
For any eventual-consistency system (DynamoDB, Cassandra), know:
R + W > N // strong read consistency
W = N // maximum durability (slow writes)
R = 1, W = 1// fast but possibly stale
where N = replica count, R = read quorum, W = write quorum.
12. Long Polling / WebSockets / Server-Sent Events
For real-time features (chat, notifications, live dashboards). Know when to reach for each:
- Long polling: simple, works behind most proxies; high latency and connection overhead
- SSE: server-to-client only, auto-reconnect, text-based — great for notifications
- WebSocket: full duplex, binary, low latency — chat, games, collaborative editing
13. Search (Elasticsearch / OpenSearch)
When the question involves full-text search, autocomplete, or faceted search, the right answer is Elasticsearch. Know that it's an inverted-index system, not a general database — you still need a system-of-record alongside.
14. Authentication & Authorization
Often skipped but increasingly probed for "real-world" questions:
- Session-based (stateful, cookie-in-Redis) vs JWT (stateless, self-contained)
- OAuth 2.0 flows: authorization code (web apps), PKCE (mobile), client credentials (service-to-service)
- RBAC vs ABAC: role-based is simpler, attribute-based is more flexible for multi-tenant
The Magic Phrases (Memorize These)
Using these phrases correctly is a clear senior-level signal. Interviewers scribble a +1 next to your name every time they hear one:
- "This has a read-to-write ratio of roughly 100:1, so I'm optimizing for reads."
- "I'd denormalize here to avoid joins at query time, at the cost of an extra write path."
- "We can use a materialized view for this query pattern."
- "For this, eventual consistency is acceptable, which lets us scale horizontally."
- "I'd put a bloom filter in front of the DB to avoid hitting it for keys that don't exist."
- "This is a classic thundering-herd problem; I'd solve it with request coalescing."
- "This operation should be idempotent, so we can safely retry."
- "I'd use a write-ahead log here for durability before the commit."
- "Given the hot partition risk, I'd shard on a composite key."
The 3 Things Junior Candidates Miss
If I had to name the three most common unforced errors I see when reviewing candidate transcripts:
1. Jumping to architecture before clarifying requirements
You'll be asked "design Twitter." If you start drawing boxes before asking "are we doing the Twitter-at-Jack-Dorsey scale or are we doing Twitter-for-a-startup-community?", you fail. The first 5 minutes of the interview are about forcing the interviewer to commit to a scope.
2. Not back-of-the-envelope'ing
"We'll have millions of users" is meaningless. "Assuming 100M DAU, each tweeting an average of 3 times per day, that's 300M writes/day = 3,500 writes/sec average, 35,000 peak" is how you build credibility in 60 seconds. Memorize the powers of two and latency numbers every programmer should know.
3. Saying "I'd use Redis" instead of "I'd use Redis because…"
The answer is never the component; it's the trade-off. "Redis for the cache" is a half-signal. "Redis for the cache because we need single-digit-millisecond reads and TTL-based invalidation, and Redis-Cluster gives us horizontal scale when we outgrow 300GB of RAM per shard" is the full signal.
The One-Page Cheat Sheet
Print this. Tape it to your monitor. Glance at it during the interview.
REQUIREMENTS (5 min — force scope)
• Functional: "What does it DO?"
• Non-functional: latency, QPS, consistency, availability
• Scale: DAU, data volume, read:write ratio
CAPACITY ESTIMATION (3 min — build credibility)
• DAU × requests/user = QPS
• QPS × avg payload = bandwidth
• QPS × seconds × retention = storage
• Cache ≈ 20% of hot data
HIGH-LEVEL DESIGN (10 min — boxes + arrows)
Client → LB → API GW → Services → DB + Cache + Queue
CDN → Static Assets
WebSocket / SSE for real-time
DATA MODEL (5 min)
• Entities + relationships
• SQL for relational + transactions
• NoSQL for scale + known access patterns
• Denormalize when read-heavy
DEEP DIVE (15 min — pick ONE component)
• Walk through a user action end-to-end
• Name specific tech (Kafka, Redis, Postgres)
• Justify every choice with a trade-off
SCALE & BOTTLENECKS (10 min — senior signal)
• "What breaks at 10x traffic?"
• Sharding strategy
• Replication + consistency
• Rate limiting + backpressure
CLOSING (2 min)
• Recap the key trade-offs
• Mention what you'd do with more time
• Acknowledge what you're NOT optimizing for
Practice this cheat sheet on a real system-design question
CoPilot Interview's system design mode generates realistic prompts and walks you through structured answers — with complexity analysis and follow-up questions — so you build muscle memory for the actual interview rhythm.
Try System Design Mode Free →FAQ
How many system design questions should I practice before interviewing?
Quality beats quantity. Do 10 questions with full 45-minute timers, self-recording, and review — that's worth more than 50 surface-level reads. A reasonable set: Twitter, WhatsApp, Uber, Dropbox, TinyURL, Instagram, Yelp, Netflix, YouTube, Rate Limiter. These ten cover every major pattern.
Do I need to memorize actual database internals (B-trees, LSM trees)?
For L4/L5 interviews: know the conceptual difference and when to reach for each (LSM for write-heavy like Cassandra; B-tree for read-heavy/range queries like Postgres). You don't need to implement one. For L6+: yes, be prepared to go deeper on one.
Should I use a specific cloud vendor's services in my answer (AWS S3, GCP Spanner)?
It's fine — and actually expected — for FAANG-adjacent interviews. The interviewer wants to know you've built real systems. But back it with a generic explanation: "I'd use S3 (an object store) for…" so the principle is clear.
What if I don't know a concept the interviewer asks about?
Say so, briefly and confidently. "I haven't worked with distributed consensus directly, but I know Raft is the modern go-to — leader election by term, log replication with quorum writes. Could you help me reason through the specific part you're asking about?" That's a stronger signal than fabrication.
How do I know I'm ready?
You can give a complete answer to a new question you've never seen, inside 45 minutes, without looking at notes, covering all 6 phases (requirements → scale). If you can do that three times in a row on random prompts, you're ready.