/system-design

System Design Interview Preparation

System Design Basics

  • Key Characteristics and Fundamentals of Distributed Systems
  • Monolithic VS Microservice (Service Discovery, Resiliency)
  • Vertical vs horizontal scaling Watch1
  • Load Balancing / Application Delivery Controller (ADC) Read1 Read2 Watch1
  • Consistent Hashing Watch1 Read1 Read2 Read3
  • Throughput, Latency
  • CAP theorem
  • ACID vs BASE
  • Redundancy and Replication
  • Partitioning/Sharding
  • Optimistic vs pessimistic locking
  • Strong vs eventual consistency
  • SQL vs NoSQL
  • Types of NoSQL (Key value, Wide column, Document-based, Graph-based)
  • Caching
  • Data center/racks/hosts
  • CPU/memory/Hard drives/Network bandwidth
  • Random vs sequential read/writes to disk
  • DNS lookup
  • HTTP, HTTPS, HTTP2
    • HTTP
    • HTTPS Read1
    • HTTP & SSL/TLS
    • Public key infrastructure and certificate authority(CA)
    • Symmetric vs asymmetric encryption
  • WebSockets
  • Long-Polling vs WebSockets vs Server-Sent Events
  • TCP/IP model
  • IPv4 vs IPv6
  • TCP vs UDP
  • Consistent Hashing
  • CDNs & Edges
  • Data Partitioning
  • Indexes
  • Master-Slave, Master-Master
  • Active-Passive, Active-Active
  • Leader election
  • Design patterns and Object-oriented design
  • Virtual machines and containers
  • Pub-sub architecture
  • REST, GraphQL
  • MapReduce
  • Bloom filters and Count-Min sketch
  • Paxos
  • Multithreading, locks, synchronization, CAS(compare and set)
  • Proxies

Building Blocks of Any Frequently Asked System Design Question

  • Authentication
    • JWT
    • OAUTH2
  • File / Media Upload
    • S3, Multiple Quality Files
  • WIP...

Tools and Technologies

  • Databases Comparison
  • Cassandra
  • MongoDB/Couchbase
  • RabbitMQ / Kafka / Pub-Sub comparison Comparison
  • Mysql / PostgreSQL
    • Scalability in Postgres
  • Redis / Memcached
  • InfluxDB [Suitable for TimeSeries, IoT data]
  • Zookeeper
  • NGINX
  • HAProxy
  • Solr, Elastic search
  • Amazon, EC2, S3
  • Docker, Kubernetes
  • Hadoop/Spark and HDFS
  • Eureka, Hysterix
  • Heroku / Azure DevOps
  • Jenkins CI/CD

System Design Problems (HLD + LLD)

  • TinyURL
  • Instagram | Photo hosting platform
  • Timeline | Newsfeed | Twitter
  • Dropbox | Google Drive
  • Whatsapp | Facebook Messenger NL GS Ref
  • MakeMyTrip | BookMyShow
  • Amazon | Flipkart
  • Youtube | Netflix NL
  • Uber | IRCTC
  • Swiggy | Zomato
  • Yelp | Nearby
  • Twitter Search
  • Google Search
  • SplitWise
  • Zerodha
  • API Rate Limiter
  • Web Crawler
  • Rate limiting system
  • Distributed cache
  • Typeahead Suggestion | Auto-complete system
  • Recommendation System
  • Design a tagging system like tags used in LinkedIn

Low Level Design Problems (Machine Coding Round) Reference

Engineering Blogs Ref

Airbnb AirPair Artsy Asana Bandcamp BenefitFocus Bitly Bittorrent Cerner Chartbeat Cloudera Cloudflare Docker Dropbox Ebay Etsy Eventbrite Facebook Flickr Fiftythree Flipboard Foursquare Github Gnip GoSquared Grouper Groupon Harry's Heroku Honeybadger Indeed Instagram Intent Linkedin Livechat Medallia Monetate Netflix Oyster Paypal Pinterest Prezi Quora Rightscale Salesforce Shopify Simple Slideshare Songkick Soundcloud Spotify Square Strava Tumblr Twitter Twilio Thumbtack Wayfair Wealthfront Webengage Yahoo Yammer Yelp Zenpayroll Zillow

Other Useful Resources:

Golden Rules to Remember

1.  If we are dealing with a read-heavy system, it's good to consider using a Cache.

2.  If we need low latency in the system, it's good to consider using a Cache & CDN.

3.  If we are dealing with a write-heavy system, it's good to use a Message Queue for async processing OR Append only logs

4.  If we need a system to be an ACID complaint, we should go for RDBMS or SQL Database

5.  If data is unstructured & doesn't require ACID properties, we should go for NoSQL Database

6.  If the system has complex data in the form of videos, images, files etc, we should go for Blob/Object storage

7.  If the system requires complex/heavy pre-computation like a news feed, we should use a Message Queue & Cache

8.  If the system requires searching data in high volume, we should consider using a search index, tries or a search engine like Elasticsearch

9.  If the system requires to Scale SQL Database, we should consider using Database Sharding & Partitioning

10. If the system requires High Availability, Performance, & Throughput, we should consider using a Load Balancer

11. If the system requires faster data delivery globally, reliability, high availability, & performance, we should consider using a CDN

12. If the system has data with nodes, edges, and relationships like friend lists, & road connections, we should consider using a Graph Database

13. If the system needs scaling of various components like servers, databases, etc, we should consider using Horizontal Scaling

14. If the system requires high-performing database queries, we should use Database Indexes

15. If the system requires bulk job processing, we should consider using Batch Processing & Message Queues

16. If the system requires reducing server load and preventing DOS  attacks, we should use a Rate Limiter

17. If the system has microservices, we should consider using an API Gateway (Authentication, SSL Termination, Routing etc)

18. If the system has a single point of failure, we should implement Redundancy in that component

19. If the system needs to be fault-tolerant, & durable, we should implement Data Replication (creating multiple copies of data on different servers)

20. If the system needs user-to-user communication (bi-directional) in a fast way, we should use Websockets

21. If the system needs the ability to detect failures in a distributed system, we should implement a Heartbeat

22. If the system needs to ensure data integrity, we should use Checksum Algorithm

23. If the system needs to scale servers with add/removal of nodes efficiently, with no hotspots, we should implement Consistent Hashing

24. If the system needs to transfer data between various servers in a decentralized way, we should go for\
    Gossip Protocol

25. If the system needs anything to deal with a location like maps, nearby resources, we should consider using Quadtree, Geohash etc

26. Avoid using any specific technology names such as - Kafka, S3, or EC2. Try to use more generic names like message queues, object storage etc

27. If High Availability is required in the system, it's better to mention that the system cannot have strong consistency. Eventual Consistency is possible

28. If asked how domain name query in the browser works and resolves IP addresses. Try to sketch or mention about DNS (Domain Name System)

29. If asked how to limit the huge amount of data for a network request like youtube search, trending videos etc. One way is to implement Pagination which limits response data.

30. If asked which policy you would use to evict a Cache. The preferred/asked Cache eviction policy is LRU (Least Recently Used) Cache. Prepare around its Data Structure and Implementation.


Credit: https://leetcode.com/discuss/interview-question/system-design/3616948/golden-rules-to-answer-in-a-system-design-interview

System Design Interview Approach Template

THINGS TO CONSIDER [5 min]

    (1) Features
    (2) API
    (3) Availability
    (4) Latency
    (5) Scalability
    (6) Durability
    (7) Class Diagram
    (8) Security and Privacy
    (9) Cost-effective

FEATURE EXPECTATIONS [5 min]

    (1) Use cases
    (2) Scenarios that will not be covered
    (3) Who will use
    (4) How many will use
    (5) Usage patterns

ESTIMATIONS [5 min]

    (1) Throughput (QPS for read and write queries)
    (2) Latency expected from the system (for read and write queries)
    (3) Read/Write ratio
    (4) Traffic estimates
            - Write (QPS, Volume of data)
            - Read  (QPS, Volume of data)
    (5) Storage estimates
    (6) Memory estimates
            - If we are using a cache, what is the kind of data we want to store in cache
            - How much RAM and how many machines do we need for us to achieve this ?
            - Amount of data you want to store in disk/ssd

DESIGN GOALS [5 min]

    (1) Latency and Throughput requirements
    (2) Consistency vs Availability  [Weak/strong/eventual => consistency | Failover/replication => availability]

HIGH LEVEL DESIGN [5-10 min]

    (1) APIs for Read/Write scenarios for crucial components
    (2) Database schema
    (3) Basic algorithm
    (4) High level design for Read/Write scenario

DEEP DIVE [15-20 min]

    (1) Scaling the algorithm
    (2) Scaling individual components: 
            -> Availability, Consistency and Scale story for each component
            -> Consistency and availability patterns
    #### Think about the following components, how they would fit in and how it would help
            a) DNS
            b) CDN [Push vs Pull]
            c) Load Balancers [Active-Passive, Active-Active, Layer 4, Layer 7]
            d) Reverse Proxy
            e) Application layer scaling [Microservices, Service Discovery]
            f) DB [RDBMS, NoSQL]
                    > RDBMS 
                        >> Master-slave, Master-master, Federation, Sharding, Denormalization, SQL Tuning
                    > NoSQL
                        >> Key-Value, Wide-Column, Graph, Document
                            Fast-lookups:
                            -------------
                                >>> RAM  [Bounded size] => Redis, Memcached
                                >>> AP [Unbounded size] => Cassandra, RIAK, Voldemort
                                >>> CP [Unbounded size] => HBase, MongoDB, Couchbase, DynamoDB
            g) Caches
                    > Client caching, CDN caching, Webserver caching, Database caching, Application caching, Cache @Query level, Cache @Object level
                    > Eviction policies:
                            >> Cache aside
                            >> Write through
                            >> Write behind
                            >> Refresh ahead
            h) Asynchronism
                    > Message queues
                    > Task queues
                    > Back pressure
            i) Communication
                    > TCP
                    > UDP
                    > REST
                    > RPC

JUSTIFY [5 min]

(1) Throughput of each layer
(2) Latency caused between each layer
(3) Overall latency justification

More Resources:

Credit: