system-design-notebook: A repository from Delphinfo

A curated collection of resources and exercises to help you learn about system design

Topics
Topics Explained
Exercises
- AWS Cloud
- Misc
Questions
Resources
System Design
- Cloud
- Real Systems
System Design Process
Interview Tips
Q&A

Topics

Requirements
- Functional Requirements
- Non-Functional Requirements
Basic architecture
- Client
- Server
- Dispatcher
Scalability
- Vertical Scaling
- Horizontal Scaling
- Scalability Factor
Availability
Performance
Resiliency
Durability
Microservices Architecture
Monolith Architecture
Cache
- Distributed Cache
- Cache Policy (aka Replacement Policy)
  - LRU (least recently used)
Load Balancing
- Consistent Hashing
- Techniques
  - Round Robin
  - Weighted Round Robin
  - Least Connection
  - Weighted Least Connection
  - Resource Based
  - Fixed Weighting
  - Weighted Response Time
  - Source IP Hash
  - URL Hash
- Sticky Sessions
- Health Checks
Fault Tolerance
Distributed System
- Fallacies of distributed systems
- Clock Drift
Extensibility
Loose Coupling
Proxy
Storage
- RAID
CDN
DNS
- Records
  - TTL
- TLD and SLD
Networking
- Bandwidth
- IP
  - Private IP
  - Public IP
- Latency
- Throughput
Databases
- Sharding
- Read Replicas
Design Level
- Low level design
- High level design

Topics Explained

Requirements

Usually a system design process starts with understanding the system's purpose and one way to understand system's purpose or goal, is to clearly define a list of requirements.
These requirements allow us not only to understand how the system will be used and how it works, but also set clear boundaries which will make sure our design is focused on the right aspects of the systems. We usually distinguish between functional and non-functional requirements.

Functional Requirements

Functional requirements are used to specify an expected function or a behaviour of the system. Simply put, something the system should be able to do.
For example, for a video streaming service a requirement might be to upload a video or comment on a video. For instant messaging application, a functional requirement will be, to be able send and receive messages.

Non-Functional Requirements

Non-functional requirements focus on how the system performs, especially in general and not focusing on specific functions.
While such requirements might affect user's experience they shouldn't affect specific functionality or features the system supports.

For example, if a system is a type of a service, a non-functional requirement might be "zero downtime" or "No loss of data".

Basic Architecture

Client

A client refers to a software or hardware accessing a resource or a service that is served by a server. While in some cases the server and the client might be on the same system/host, in most cases they will be on separate systems.

Examples for clients:

A Web browser that is used by a user to access a certain web page
A mobile phone that is used by the user to read emails

Server

A server, similarly to a client, can be a software or hardware, but as opposed to a client, its role is to serve the client. It can be by providing a certain resource to the client or let it use a service that is running on the server. Few examples:

A system that stores files and allow the user to access or download them
A system that runs a service which allows users to listen to music

Scalability

Wikipedia: "Scalability is the property of a system to handle a growing amount of work by adding resources to the system"

In simpler words, scalability is about answering the question whether a system or an architecture are able to scale in a way that meets the new workloads and demand.
More practically, answer questions like:

if a system runs a database, does it able to handle more queries?
if a system runs a service that stream videos to million users. Will it able to stream them the same way if the amount of users would triple itself?

Also, scaling can be performed on different components. For example, in most cloud environments scaling is supported in case of:

Compute hosts
Virtual network functions
VMs/Instances
Containers

There are different ways to scale.

Vertical Scaling

Adding additional resources to the existing system/component/unit. If we have a server, a vertical scaling might be done in one or more of the following ways:

Adding more RAM to the server
Adding more storage/disks
Adding CPUs

Horizontal Scaling

Adding more systems/units/components but at the same time, make them work together so it would seems to the client as if there is one system it interacts with.
Few examples:

Instead of one web server, having two web servers with one load balancer balancing the traffic between them
Instead of one database server, having two databases

Scalability Factor

When you double the resources of your system (or design) you might expect your system to be able to handle double the workload as well, right? But this is not necessarily what will happen. Scalability factor is the term used to describe the workload your system is able to handle as a result of scaling your resources.

Linear Scalability

Linear Scalability happens when the workloads your system is able to handle scale accordingly to the scale in resources. The scalability factor remains constant as you scale.
For example, you triple the resources of your system -> the system is able to handle triple the workloads. In reality, it's actually not the case most of the time.

Sub-Linear Scalability

A more realistic outcome of scaling systems would be that some resources or component may not scale as expected (or as other resources and components). So doubling the resources will actually lead to an improvement of only x1.5 in workloads handling. In this case the scalability factor will be lower than 1.0

Supra-Linear Scalability

This is the optimal outcome. You triple the workloads handling by "only" doubling your resources for example. In other words, the ratio in performance change is bigger than the ratio in scaling changes (e.g. adding more CPUs). A scalability factor in this case, is bigger than 1.0

Negative Scalability

It may sound crazy, but in some cases, scaling your system might actually lead to worse results and that's exactly what negative scalability is all about. Scalability factor is below 0.

Networking

Public IP

Wikipedia: "A public IP address, in common parlance, is a globally routable unicast IP address, meaning that the address is not an address reserved for use in private networks"

From system design perspective, when you have a resources or a component, you would like everyone to be able to access to, whether for direct communication (like a web server) or as a gateway for other components (like a load balancer), you should use a public IP

Private IP

Whenever you don't want users to be able to globally interact with a certain component or resource, you should use a private IP address. Few examples:

Web servers that only the load balancer should communicate with them directly
Internal servers that users outside the organization should access

Private IPs, as opposed to public IPs, don't have to be unique and each separate network, can use the same addresses.

Latency

The time it takes to perform a certain task/action

Throughput

The number of tasks/actions per unit of time

DNS

Wikipedia: "Most prominently, it translates more readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols."

In other words, the most common use can of a DNS would be a address translation. It can be from a hostname to IP address and vice versa - from an IP address to a hostname. In addition, a DNS can be used for load balancing, using the round robin technique.

Distributed Systems

Fallacies of distributed systems

There are some challenges that we have to deal with when designing and managing distributed/high-scale systems, but not so much with non-distributed systems:

Network Reliability: when you run a system on a single host, networking isn't as problem as when you need to to manage dozen or hundreds (and more) nodes
Zero Latency: running on a single host, latency is not an issue, but when managing multiple hosts, how to keep latency as close as possible to zero?
Bandwidth: same a latency, when running on a single host, bandwidth is infinite, but when the system is distributed you have to move data between the server so bandwidth becomes a challenge
Constantly changing Topology: In distributed system, hosts and different components may go down or up meaning, the topology is constantly changing while a single host is a constant
Security: keeping one host secured is much more simple than securing a distributed system where not only all components should be secured but also the communication between the different hosts and components
Size of the team: managing one host requires less administrators than managing a distributed system with hundreds, thousands or even more hosts and components

Clock Drift

Wikipedia: "Clock drift refers to several related phenomena where a clock does not run at exactly the same rate as a reference clock. That is, after some time the clock "drifts apart" or gradually desynchronizes from the other clock"

Synchronizing clock is a challenge in distributed systems because each system has its own clock and once the system's clock drifts, this might affect the system as a whole and lead to unintended behaviours.

CDN

Cloudflare: "A content delivery network (CDN) refers to a geographically distributed group of servers which work together to provide fast delivery of Internet content."

In other words, a content delivery network allows you to quickly transfer content by having servers with the content around the world or certain area. The client then, access these servers instead of the main server where the data originates from.

Exercises

AWS Cloud

What time is it? (stateless application)

Design the most basic architecture for a web application (server based) that tells a single user what time is it (no DB, no scaling, ...) with maximum of two components

In this case what you need is two components:

EC2 instance - this is where our application will run. A basic micro t2 instance is more than enough

Elastic IP address - This is the static IP address our user will use every time to reach the application. In case the instance is not operational, we could always move the IP address to one that it is (if we manage more than one instance)

"What time is it?" but with more than one user

Following the last exercise, your web app became a huge success and many users start using it. What might be the problem with moving from one user to multiple users and how to deal with it using a single improvement of the architecture?

Your instance might not be strong enough to handle requests from multiple users and soon enough you might see RAM and CPU utilized fully. One way to deal with it is, to perform what is called "Vertical Scaling" which is the act of adding more resources to your instance. In AWS case, switching to an instance type with more resources like M5 for example.

Note: The problem with vertical scaling (in case you have one node) is downtime (when upgrading the instance type, the instance will be down) so another thing you would want to do is "Horizontal Scaling" which is the act of adding more instances/resources.

"What time is it?" but without elastic IP addresses

Following the last two exercises, you would like to change the architecture offered in the solution, to not use elastic IP addresses for obvious reasons that it's not really scalable (each EC2 instance has a different IP and users are not able to remember them all). Offer an improvement

Instead of using elastic IP addresses we can add a record in the DNS, using the Route 53 service, to have a record from the type A. This way, all users will be able to access the app using an hostname and the IP address will be provided to them by the DNS

It's important to note that this solution is not optimal if you plan to scale down and up at some point. Due to the TTL value of a record, a user will keep contacting the same IP address, even if the node is already down.

A more proper and complete architecture would be to use an ELB

But even with ELB used and "Auto scaling group" for automatically scaling the nodes, this architecture is not optimal. Can you point what is the problem with current architecture? (from two different aspecs)

"What time is it?" - Final Part

Following the last "What time is it?" exercise, state the issues with current architecture and what would you imrpove

With current architecture, the application is perhaps able to scale up and down, but when the availability zone is down, your entire application is down. So the first improvement would be to make both ELB and the application itself (the EC2 instances) multi-AZ.

Secondly, if you know you always need an instance (or two) for the application, you might want to have reserved nodes. Reserved nodes means you pay less for instances which means you save on costs.

"Video Games Shop" (stateful application)

The following architecture was proposed for an online video games shop with the requirements of:

Support thousands of users at any given point of time
Users can register
Shopping cart items shouldn't be lost when the user browsing the store

The problem is that users report that when they browse for additional video games to buy, they lose their shopping cart. What is the problem with the current architecture and how to deal with it?

Such application is a stateful application, meaning it's important that we'll keep the information about the client from one session to another. It seems that with current architecture, every time the user initiates new session, it's perform against a different instance without saving client information.

There are a couple of solutions that can be applied in this case:

Load Balancer Sticky Sessions: users will be redirected to the instance they initiated previously session with, in order to to not lose client's data. There is a disadvantage here of losing

User cookies: the client/user stores the relevant data (shopping cart in this case) and in this case it doesn't matter with which EC2 instance the user interacts with, the data on the shopping cart will be sent from the client/user to the relevant instance. The disadvantages: HTTP requests are much heavier because data is attached with each request and it holds some security risks as cookies can be potentially modified

"Video Games Shop" - User Data

Following the last exercise, is there another way to deal with user's data (short and long term) except user's cookies and sticky sessions?

There is something called "server session" where we need to add a new component to the architecture - ElastiCache or DynamoDB, to store the data on the shopping cart of each user. In order to identify it, we'll use a session ID which will be sent by the client/user every request

For long term data (user name, address, etc.) we'll use a database (e.g. RDS). There are a couple of variations as to how we can use it. A master instance where we'll write the data and a replication from which we'll read data:

A different approach can be to use Cache + DB, where for each session, we'll check if the data is in the cache and if it's not, then we'll access the DB (this is also called "Write Through"):

Misc

Note: The names of the exercises are quotes from movies (sometimes little bit modified). If you can guess from which movie, please submit it to movies.md file in this way: [QUOTE] [MOVIE] [YOUR NAME]
Another note: Exercises may repeat themselves in different variations to practice and emphasize different concepts.

"Elementary, my dear Watson"

You have a website running on a single server. It's mostly running fine because only two users access it on weekly basis :'(
It suddenly becomes super popular and many users try to access it, but they are experiencing issues due to high load of the server. Two questions: * What term/pattern in system design is referring to the issue you are experiencing? * How can you deal with it (even if partially) WITHOUT adding more servers or changing the architecture?

Scalability. Your web server doesn't scale based on demand (= the additional users accessing your website) hence they are experiencing issues.

Apply vertical scaling which means, adding more resources to your server - more CPU, more RAM. This way, your architecture doesn't change, but your website is able to serve more users.

Will 'vertical scaling' solve your scale issues permanently? Is it the optimal solution?

It might solve your issue for limited time, but you can't solely rely on it. Vertical scaling has limitations. You can't keep adding RAM, storage and CPU endlessly. Eventually you'll hit some physical limit where for example, you simply don't have anymore space in your server box and you bought the best components you could.

Assuming you now can extend the architecture, what would you change?

"Perfectly balanced, as all things should be"

You have the following simple architecture of a server handling requests from a client. What are the drawbacks of this design and how to improve it?

Limitations:

Load - at some point it's possible the server will not be able to handle more requests and it will fail or cause delays

Single point of failure - if the server goes down, nothing will be able to handle the requests

How to improve:

Further limitations:
- Load was handled as well as the server being a single point of failure, but now the load balancer is a single point of failure.

Is there a way to improve the above design without adding an actual load balancer instance?

Yes, one could use DNS load balancing.
Bonus question: which algorithm a DNS load balancer will use?

What are the drawbacks of round robin algorithm in load balancing?

A simple round robin algorithm knows nothing about the load and the spec of each server it forwards the requests to. It is possible, that multiple heavy workloads requests will get to the same server while other servers will got only lightweight requests which will result in one server doing most of the work, maybe even crashing at some point because it unable to handle all the heavy workloads requests by its own.

Each request from the client creates a whole new session. This might be a problem for certain scenarios where you would like to perform multiple operations where the server has to know about the result of operation so basically, being sort of aware of the history it has with the client. In round robin, first request might hit server X, while second request might hit server Y and ask to continue processing the data that was processed on server X already.

"For all my actions both public and private"

The following is an architecture of a load balancer serving and three web servers. Assuming, we would like to have a secured architecture, that makes sense, where would you set a public IP and where would you set a private IP?

It makes sense to hide the web servers behind the load balancers instead of giving users direct access to them, so each one of them will have a private IP assigned to it. The load balancer should have a public IP since, we except anyone who would like to access a certain web page/resource, to go through the load balancer hence, it should be accessible to users.

What load balancing techniques are there?

Round Robin

Weighted Round Robin

Least Connection

Weighted Least Connection

Resource Based

Fixed Weighting

Weighted Response Time

Source IP Hash

URL Hash

"Keep calm, all I want is your cash"

The following is a simple architecture of a client making requests to web server which in turn, retrieves the data from a datastore. What are the drawbacks of this design and how to improve it?

Limitations:

Time - retrieving the data from the datastore every time a request is made from the client, might take a while

Single point of failure - if the datastore is down (or even slow) it wouldn't be possible to handle the requests

Load - the datastore getting all the requests can result in high load on the datastore which might result in a downtime

How to improve:

Are you able to explain what is Cache and in what cases you would use it?

Why to use cache?

Save time - Accessing a remote datastore, and in general making network calls, takes time

Reduce load - Instead of the datastore handling all the requests, we can take some of its load and reduce by accessing the cache

Avoid repetitive tasks - Instead of querying the data and processing it every time, do it once and store the result in the cache

Why not storing everything in the cache?

For multiple reasons:

The hardware on which we store the cache is in some cases much more expensive

More data in the cache, the bigger it gets and longer the put/get actions will take

"In a galaxy far, far away..."

The following is a system design of a remote database and three applications servers

Limitations:

Latency. Every query made to the remote database will hit latency, even if small.

In case the remote database crashes, the app will stop working

How to improve:

* Replicate each database to the local app server. This has several advantages. First, we are not bound to latency anymore. Secondly, a fai

Further limitations:
- If the remote database isn't accessible for a long period of time, we'll have an outdated database and each app has the potential to work against a different DB

"A bit on the slow side"

The following is an improvement of the previous system design

Limitations:

Queries to database might be slow, even on the server itself where the app is running

Once the remote database isn't available, the local databases will not by in sync

How to improve:

"Always the same one"

Every request sent by the same client, is routed every time to a different web server. What problem the user might face with such design? How to fix it?

The problem: the user might need to authenticate every single request, because different web servers handle its requests.

A possible solution: use sticky sessions where the user is routed to the same instance every single time

"Coming back to find we've failed"

You have a design of load balancer and a couple of web instances behind it. Sometimes the instances crash and the user report the application doesn't works for them. Name one possible way to deal with such situation.

One possible way to deal with it, is by using health checks. Where an instance that doesn't pass the health check, will be excluded from the list of instances used by the load balancer to forward traffic to.

"In any major city, minding your own business is a science"

You have a production application using a database for reads and writes. Your organization would like to add another application to work against the same database but for analytics purposes (read only). What problem might arise from this new situation and what one improvement you can apply to the design?

Adding another application to work against the same database can create additional load on your database which may lead to issues since the additional load might reach the limits of your database capacity constraints.

One improvement to the design could be to add a read replica instance of your database. This way the new application can work against the read replica instead of the original database. The replication will be asynchronous but in most cases, for analytics applications, that's good enough.

Questions

This is a more "simplified" version of exercises section. No drawings, no evolving exercises, no strange exercises names, just straightforward questions, each in its own category.

Your website usually serves on average a dozen of users and has good CPU and RAM utilization. It suddenly becomes very popular and many users try to access your web server but they are experiencing issues, and CPU, RAM utilization seems to be on 100%. How would you describe the issue?

Scalability issue. The web server doesn't scales :'(
In order to avoid such issues, the web server has to scale based on the usage. More users -> More CPU, RAM utilization -> Add more resources (= scale up). An When there are less users accessing the website, scale down.