System Design Interview Framework

This document defines a suggested system design interview structure, which can be used as a guideline. Please feel free to open a PR with suggestions!

Interview Steps

1. Collect requirements

1.1 Functional requirements

Clarifying questions

What will the system do?
What exactly are we going to build: API, web app, mobile app?

Examples

TODO

1.2 Main stakeholders of the system

Clarifying questions

Who are the main users of the system?

Examples

TODO

1.3 Non-functional requirements

Clarifying questions

How can we tell that the system is working?
Is there a bottleneck in the design?
How do the components work together?
How do we provide great service to everyone?

Examples

TODO

2. Back-of-the-envelope calculation

Clarifying questions

What scale is expected from the system?
How much storage will we need?
What network bandwidth usage are we expecting?

Examples

TODO

Numeric estimates for properties like this:

Latency
Throughput
Storage

Cost of operations:

Read from memory
Read from disk
Local area network (LAN) round-trip
Cross-continental network

3. Define system interface

Clarifying questions

If it’s an API, what endpoints will we need?

Examples

TODO

4. Define data model

Clarifying questions

TODO

Examples

TODO

5. Create high-level design (draw a block diagram)

Examples

TODO

6. Add details (cache, replicas, detalization of individual components)

Clarifying questions

Since we will be storing a massive amount of data, how should we partition our data to distribute it to multiple databases? Should we try to store all the data of a user on the same database? What issue could it cause?
How will we handle X?
What data access models will we have?
How much and at which layer should we introduce cache to speed things up?
What components need better load balancing?

Examples

TODO

7. Identifying bottlenecks

Clarifying questions

Is there any single point of failure in our system? What are we doing to mitigate it?
Do we have enough replicas of the data so that if we lose a few servers we can still serve our users?
Similarly, do we have enough copies of different services running such that a few failures will not cause total system shutdown?
How are we monitoring the performance of our service? Do we get alerts whenever critical components fail or their performance degrades?
How does system respond to various failures

Examples

TODO

8. Add telemetry: monitoring, logging

Clarifying questions

TODO

Examples

TODO

Must know

Sharding data
Replication types
Write-ahead logging
Separating data and metadata storage
Load distribution

General guidelines

Layout trade-offs and compromises and explain reasoning.
Explain thought process
Keep going, don't stop, don't give up.

Resources

TODO

slavikdev/system-design-interview-framework

System Design Interview Framework

Interview Steps

1. Collect requirements

1.1 Functional requirements

Clarifying questions

Examples

1.2 Main stakeholders of the system

Clarifying questions

Examples

1.3 Non-functional requirements

Clarifying questions

Examples

2. Back-of-the-envelope calculation

Clarifying questions

Examples

Numeric estimates for properties like this:

Cost of operations:

3. Define system interface

Clarifying questions

Examples

4. Define data model

Clarifying questions

Examples

5. Create high-level design (draw a block diagram)

Examples

6. Add details (cache, replicas, detalization of individual components)

Clarifying questions

Examples

7. Identifying bottlenecks

Clarifying questions

Examples

8. Add telemetry: monitoring, logging

Clarifying questions

Examples

Must know

General guidelines

Resources