System Design Interview Questions & How to Answer Them

5 min read·16 questions·Updated Apr 7, 2026

System design interviews are the most senior-weighted round in the tech interview process — and the one where preparation makes the biggest difference. Unlike coding questions with objectively correct answers, system design is open-ended: interviewers evaluate how you scope a problem, make trade-offs, handle scale, and communicate complex ideas clearly. Google, Meta, Amazon, and Stripe all run 45–60 minute design rounds where you're expected to architect a system from scratch on a whiteboard. This guide covers the most common system design questions, with a structured approach to each so you walk in with a repeatable framework, not a memorised answer.

Fundamentals & Building Blocks

Before tackling full system designs, you need fluency in the core building blocks. These questions test whether you understand the primitives — load balancing, caching, databases, queues — and when to reach for each one.

1How would you design a caching layer for a high-traffic web application?

Why it's asked

Caching is the single most common performance optimization in distributed systems. Tests whether you understand cache strategies, invalidation, and trade-offs.

How to answer

Start by clarifying the access patterns (read-heavy vs. write-heavy). Discuss cache placement (client, CDN, application, database), choose a strategy (cache-aside, write-through, write-behind), address invalidation (TTL, event-driven), and handle cache stampede.

Key points to hit

  • Cache-aside (lazy loading) for read-heavy workloads — simplest to implement
  • Write-through for consistency-critical data — slower writes, always fresh reads
  • Invalidation is the hard part: TTL for eventual consistency, event-based for real-time
  • Discuss hot key distribution and consistent hashing for cache sharding

Interviewers love when you discuss cache failure modes: what happens when the cache goes down? How does your system degrade gracefully?

2Explain how you would design a load balancer.

Why it's asked

Tests understanding of traffic distribution, fault tolerance, and horizontal scaling — foundational for any large-scale system.

How to answer

Define requirements (L4 vs. L7, global vs. local), compare algorithms (round-robin, least connections, consistent hashing), discuss health checks and failover, then address SSL termination and session affinity.

Key points to hit

  • L4 (transport) for raw throughput, L7 (application) for content-based routing
  • Least connections works better than round-robin when request latency varies
  • Health checks: active (pinging servers) vs. passive (monitoring responses)
  • Session affinity considerations: sticky sessions vs. externalised session store
3How would you choose between a SQL and NoSQL database for a new system?

Why it's asked

Tests your ability to make principled technology decisions based on requirements, not hype. One of the most common trade-off discussions in design interviews.

How to answer

Start with data model (relational vs. document/key-value), query patterns (complex joins vs. simple lookups), consistency requirements (ACID vs. eventual), and scale characteristics (vertical vs. horizontal).

Key points to hit

  • SQL for complex relationships, transactions, and strong consistency (banking, inventory)
  • NoSQL for high write throughput, flexible schemas, and horizontal scaling (feeds, logs, sessions)
  • Many systems use both: SQL for core data, NoSQL for caching or analytics
  • Discuss CAP theorem trade-offs: consistency vs. availability during partitions

Distributed Systems

These questions probe your understanding of systems that span multiple machines, data centres, and regions. Concurrency, consistency, and failure handling are the core themes.

4How would you design a distributed rate limiter?

Why it's asked

Rate limiting is essential for API security, fairness, and resource protection. The distributed aspect tests your understanding of consistency trade-offs in multi-node systems.

How to answer

Clarify requirements (per-user, per-API, global), discuss algorithms (token bucket, sliding window log, sliding window counter), then address synchronisation across nodes (Redis, local approximation) and edge cases.

Key points to hit

  • Token bucket: simple, allows bursts, easy to implement with Redis INCR + EXPIRE
  • Sliding window counter: more accurate than fixed window, moderate complexity
  • Centralised (Redis) vs. local approximation — trade-off between accuracy and latency
  • Handle race conditions: Redis Lua scripts for atomic check-and-increment

A strong answer acknowledges the trade-off between strict accuracy (centralised counter) and low latency (local counters with periodic sync). Show you understand both approaches.

5Design a distributed task queue (like Celery or SQS).

Why it's asked

Tests understanding of asynchronous processing, at-least-once delivery, and distributed coordination — patterns used in every large-scale system.

How to answer

Define the API (enqueue, dequeue, ack), discuss persistence (in-memory vs. disk-backed), delivery semantics (at-least-once, at-most-once, exactly-once), visibility timeout, dead letter queues, and scaling consumers.

Key points to hit

  • At-least-once delivery with idempotent consumers is the pragmatic default
  • Visibility timeout: requeue unacknowledged messages after N seconds
  • Dead letter queue for poison messages that fail repeatedly
  • Partitioning by topic/queue for horizontal scaling of consumers
6How would you handle data consistency across microservices?

Why it's asked

The classic distributed systems challenge. Tests whether you understand saga patterns, eventual consistency, and when to use distributed transactions.

How to answer

Start by explaining why distributed transactions (2PC) are expensive and fragile. Introduce the saga pattern (choreography vs. orchestration), discuss compensating transactions for rollback, and address idempotency.

Key points to hit

  • Choreography: services emit events, others react — simple but hard to debug at scale
  • Orchestration: a central coordinator manages the workflow — easier to reason about
  • Compensating transactions: each step has a rollback action if a later step fails
  • Idempotency keys on every write operation to handle retries safely

Data-Intensive Applications

Modern systems process enormous volumes of data — real-time Analytics, search indices, notification pipelines. These questions test your ability to design for throughput, latency, and data freshness.

7Design a real-time analytics dashboard (e.g. page views per second by country).

Why it's asked

Tests your ability to design stream processing pipelines and handle high write throughput with low-latency reads.

How to answer

Clarify latency requirements (seconds vs. minutes). Design the pipeline: ingestion (Kafka/Kinesis), stream processing (Flink/Spark Streaming), pre-aggregation, storage (time-series DB or pre-computed materialised views), and serving layer.

Key points to hit

  • Kafka for ingestion: partitioned by event type, retained for replay
  • Pre-aggregate in the stream processor — don't query raw events at read time
  • Time-series DB (InfluxDB, TimescaleDB) for efficient range queries
  • Lambda vs. Kappa architecture: discuss trade-offs between batch correction and streaming simplicity
8How would you design a notification system (push, email, SMS)?

Why it's asked

Tests multi-channel delivery, prioritisation, deduplication, and handling unreliable external services (email providers, push notification services).

How to answer

Define the pipeline: event ingestion → preference check → template rendering → channel routing → delivery → tracking. Discuss each stage with reliability guarantees.

Key points to hit

  • Preference service: user-level and notification-type-level opt-in/opt-out
  • Deduplication: idempotency key per notification to prevent double-sends
  • Channel routing: priority queue for urgent notifications, batch queue for digests
  • Retry with exponential backoff per channel; circuit breaker for failing providers
9Design a search autocomplete system.

Why it's asked

Tests understanding of trie data structures, caching, ranking algorithms, and latency optimisation for a highly interactive feature.

How to answer

Clarify scale (queries per second, corpus size). Design: trie or prefix index built from query logs → ranking by popularity/recency → multi-tier caching (CDN, application, prefix-level) → A/B testable ranking layer.

Key points to hit

  • Trie with top-K results stored at each node — O(1) lookup for common prefixes
  • Offline pipeline: rebuild trie from query logs hourly/daily, promote to serving tier
  • CDN caching for top 1000 prefixes covers 80%+ of requests
  • Personalisation layer: blend global popularity with user-specific history

State the latency requirement early (typically <100ms p99). This anchors every subsequent design choice and shows the interviewer you think in contracts, not just architecture.

API Design

API design questions test your ability to create clean, consistent, and extensible interfaces. These show up frequently at Stripe, Twilio, and other developer-platform companies.

10Design a RESTful API for a social media feed.

Why it's asked

Tests your ability to design intuitive endpoints, handle pagination, versioning, and rate limiting — and think about the consumer's experience.

How to answer

Define resources (posts, users, feeds). Design endpoints (CRUD + feed), choose pagination strategy (cursor vs. offset), discuss versioning, authentication, and rate limiting.

Key points to hit

  • GET /feed?cursor=abc&limit=20 — cursor pagination for infinite scroll (offset pagination breaks with real-time content)
  • POST /posts with idempotency key header to prevent double-posts on retry
  • Versioning: URL path (/v1/) for breaking changes, headers for minor variations
  • Rate limiting: return 429 with Retry-After header; differentiate by tier
11How would you design a webhook delivery system?

Why it's asked

Tests understanding of reliable delivery to external, unreliable endpoints — a common platform engineering challenge.

How to answer

Design the pipeline: event → serialise → queue → deliver → retry → alert. Address reliability (at-least-once delivery), security (HMAC signatures), and monitoring.

Key points to hit

  • Queue events before delivery — never call external endpoints synchronously
  • Retry with exponential backoff: 1s, 30s, 5m, 1h, max 24h
  • HMAC-SHA256 signature header so recipients can verify authenticity
  • Dead letter queue + alerting for endpoints that fail repeatedly
12Design an API for a payment processing system.

Why it's asked

Tests your ability to handle financial data with correctness guarantees: idempotency, exactly-once semantics, and audit trails.

How to answer

Design endpoints (create charge, capture, refund). Emphasise idempotency keys (mandatory), state machine for payment lifecycle (pending → captured → refunded), audit logging, and PCI compliance considerations.

Key points to hit

  • Idempotency key on every mutating request — return previous result on duplicate
  • Payment lifecycle as finite state machine with valid transitions only
  • Double-entry bookkeeping for every money movement — audit trail is non-negotiable
  • Separate authorisation from capture for marketplace/escrow use cases

State upfront that financial systems require exactly-once semantics. This immediately sets you apart — most candidates don't mention idempotency until prompted.

Real-World System Design

These classic "Design X" questions test your ability to scope a massive problem, make architectural decisions under ambiguity, and communicate your reasoning in real-time.

13Design a URL shortener (like Bitly).

Why it's asked

The classic warm-up system design question. Tests scoping, encoding schemes, database design, and caching — simple enough to go deep in 45 minutes.

How to answer

Clarify scale (URLs per day, reads vs. writes ratio). Design: hash/encode function → key-value store → redirect service → analytics. Address collision handling, custom aliases, and expiration.

Key points to hit

  • Base62 encoding of auto-incrementing ID or hash — discuss collision probability
  • Read-heavy (100:1): cache the most popular short URLs in Redis
  • Write path: generate ID → encode → store mapping → return short URL
  • Analytics: async write to analytics store on every redirect, not in the hot path
14Design a chat system (like Slack or WhatsApp).

Why it's asked

Tests real-time communication, persistent connections, message ordering, and delivery guarantees — a rich problem with many design dimensions.

How to answer

Clarify requirements (1:1, group, channels; online status; message history). Design: connection layer (WebSocket), message routing, storage (per-conversation), presence service, push notifications for offline users.

Key points to hit

  • WebSocket for real-time bidirectional communication; HTTP fallback for reliability
  • Message ordering: logical clocks per conversation (not global timestamps)
  • Fan-out: for group messages, write to a per-user inbox (fan-out on write) or query group messages on read (fan-out on read) — trade-off depends on group size
  • Offline delivery: queue undelivered messages, send push notification, deliver on reconnect
15Design a content delivery network (CDN).

Why it's asked

Tests understanding of global distribution, caching hierarchies, and latency optimization — relevant for any company serving media or static assets worldwide.

How to answer

Define the architecture: DNS-based routing to nearest edge node → edge cache → midtier/shield cache → origin. Discuss cache invalidation, consistent hashing for shard assignment, and TLS termination.

Key points to hit

  • GeoDNS or Anycast routing to direct users to the nearest Point of Presence (PoP)
  • Edge → Shield → Origin hierarchy: shield layer reduces origin load dramatically
  • Cache invalidation: purge API for urgent updates, TTL for regular rotation
  • Consistent hashing for assigning content to cache nodes — minimises redistribution on node changes
16Design a ride-sharing matching system (like Uber/Lyft).

Why it's asked

Tests geospatial indexing, real-time matching algorithms, and eventual consistency in a highly dynamic system.

How to answer

Clarify matching criteria (proximity, ETA, driver rating, surge). Design: location tracking service → geospatial index (geohash/quadtree) → matching algorithm → dispatch → trip state machine.

Key points to hit

  • Geohash or S2 cells for efficient proximity queries on driver locations
  • Driver location updates at 3–5 second intervals; write-heavy path to in-memory store
  • Matching algorithm: scored ranking (ETA, rating, trip efficiency) within radius
  • Trip state machine: requested → matched → en-route → in-progress → completed → rated

Ready to practise these answers out loud?

Start a mock interview

Frequently Asked Questions

Spend the first 5 minutes clarifying requirements and scope (functional requirements, non-functional requirements, scale estimates). The next 5 minutes on high-level design (draw the major components and data flow). Then spend 25 minutes on detailed design, diving deep into 2–3 components the interviewer cares most about. Use the final 10 minutes for scalability, trade-offs, and monitoring. Let the interviewer guide the deep-dives — don't monologue through your entire design without checking in.

No — interviewers can tell immediately. Instead, memorise the building blocks (load balancers, caches, queues, databases, CDN) and practice combining them for different problems. The goal is fluency in architectural patterns, not recall of specific solutions. Practice 8–10 different designs until the building blocks feel instinctive, then you can tackle any novel problem.

Quick and reasonable, not precise. Round aggressively: "100M DAU × 10 requests/day = 1B requests/day ≈ 12K QPS" is the right level. The point is to anchor your design decisions (do we need caching? sharding? async processing?) — not to produce exact numbers. Interviewers value the reasoning process more than the arithmetic.

Having a working knowledge of common technologies strengthens your answers, but the design interview is about concepts, not brand names. It's fine to say "a message queue like Kafka or SQS" — what matters is that you explain why you need a message queue and what properties it provides (durability, ordering, at-least-once delivery). Never name-drop a technology you can't explain at a basic level.

They're the entire point. Functional requirements tell you what the system does; non-functional requirements (latency, throughput, availability, consistency, durability) determine how you design it. Two systems with identical features but different latency requirements (100ms vs. 10s) will have completely different architectures. Always clarify non-functional requirements before drawing a single box.

Jumping straight into detailed component design without clarifying requirements or sketching a high-level architecture. The second most common mistake is over-engineering: designing for Google-scale when the requirements say 10K users. Start simple, state your assumptions, and add complexity only when the numbers demand it. Interviewers promote candidates who show good judgement about when complexity is warranted.

You've read the questions.
Now, practise out loud.

Jump into a live mock interview with an AI interviewer. Get scored feedback on every answer.

No signup needed
Start a mock interview

~30 seconds to set up