system-design: A repository from genzarchitect

System Design Basics

Key Characteristics and Fundamentals of Distributed Systems
Monolithic VS Microservice (Service Discovery, Resiliency)
Vertical vs horizontal scaling Watch1
Load Balancing / Application Delivery Controller (ADC) Read1 Read2 Watch1
Consistent Hashing Watch1 Read1 Read2 Read3
Throughput, Latency
CAP theorem
ACID vs BASE
Redundancy and Replication
Partitioning/Sharding
Optimistic vs pessimistic locking
Strong vs eventual consistency
SQL vs NoSQL
Types of NoSQL (Key value, Wide column, Document-based, Graph-based)
Caching
Data center/racks/hosts
CPU/memory/Hard drives/Network bandwidth
Random vs sequential read/writes to disk
DNS lookup
HTTP, HTTPS, HTTP2
- HTTP
- HTTPS Read1
- HTTP & SSL/TLS
- Public key infrastructure and certificate authority(CA)
- Symmetric vs asymmetric encryption
WebSockets
Long-Polling vs WebSockets vs Server-Sent Events
TCP/IP model
IPv4 vs IPv6
TCP vs UDP
Consistent Hashing
CDNs & Edges
Data Partitioning
Indexes
Master-Slave, Master-Master
Active-Passive, Active-Active
Leader election
Design patterns and Object-oriented design
Virtual machines and containers
Pub-sub architecture
REST, GraphQL
MapReduce
Bloom filters and Count-Min sketch
Paxos
Multithreading, locks, synchronization, CAS(compare and set)
Proxies

Building Blocks of Any Frequently Asked System Design Question

Authentication
- JWT
- OAUTH2
File / Media Upload
- S3, Multiple Quality Files
WIP...

Tools and Technologies

Databases Comparison
Cassandra
MongoDB/Couchbase
- Mongo: Read1, Read2, Read3, Read4, Read5, Read6, IQ's
RabbitMQ / Kafka / Pub-Sub comparison Comparison
- RabbitMQ: Watch1, Watch2
- Google PubSub: Watch Playlist
Mysql / PostgreSQL
- Scalability in Postgres
Redis / Memcached
InfluxDB [Suitable for TimeSeries, IoT data]
Zookeeper
NGINX
HAProxy
Solr, Elastic search
Amazon, EC2, S3
Docker, Kubernetes
Hadoop/Spark and HDFS
Eureka, Hysterix
Heroku / Azure DevOps
Jenkins CI/CD

System Design Problems (HLD + LLD)

TinyURL
Instagram | Photo hosting platform
Timeline | Newsfeed | Twitter
Dropbox | Google Drive
Whatsapp | Facebook Messenger NL GS Ref
MakeMyTrip | BookMyShow
Amazon | Flipkart
Youtube | Netflix NL
Uber | IRCTC
Swiggy | Zomato
Yelp | Nearby
Twitter Search
Google Search
SplitWise
Zerodha
API Rate Limiter
Web Crawler
Rate limiting system
Distributed cache
Typeahead Suggestion | Auto-complete system
Recommendation System
Design a tagging system like tags used in LinkedIn

Low Level Design Problems (Machine Coding Round) Reference

Elevator system
Snake and Ladder game
Tic Tac Toe
ATM machine - https://medium.com/swlh/atm-an-object-oriented-design-e3a2435a0830
Traffic Control System
Vehicle Parking System
Online Coding Platform problem-statement
File Sharing System
Object Oriented Design Prerations [https://www.oodesign.com/]
SOLID Principles
Design Patterns [https://refactoring.guru/design-patterns]
More Problems List
More Good Resources:
- https://refactoring.guru/design-patterns/what-is-pattern
- http://www.cs.unibo.it/~cianca/wwwpages/ids/esempi/coffee.pdf Recomended by - sudoCode
- https://cseweb.ucsd.edu//~wgg/CSE210/ecoop93-patterns.pdf Recomended by - sudoCode

Engineering Blogs Ref

Airbnb AirPair Artsy Asana Bandcamp BenefitFocus Bitly Bittorrent Cerner Chartbeat Cloudera Cloudflare Docker Dropbox Ebay Etsy Eventbrite Facebook Flickr Fiftythree Flipboard Foursquare Github Gnip GoSquared Grouper Groupon Harry's Heroku Honeybadger Indeed Instagram Intent Linkedin Livechat Medallia Monetate Netflix Oyster Paypal Pinterest Prezi Quora Rightscale Salesforce Shopify Simple Slideshare Songkick Soundcloud Spotify Square Strava Tumblr Twitter Twilio Thumbtack Wayfair Wealthfront Webengage Yahoo Yammer Yelp Zenpayroll Zillow

Other Useful Resources:

HOW TO ACE A SYSTEMS DESIGN INTERVIEW-https://www.palantir.com/2011/10/how-to-ace-a-systems-design-interview/
HighScalability Blog-http://highscalability.com/
Distributed Systems-http://book.mixu.net/distsys/single-page.html
Distributed Deep Dive - https://ably.com/blog/introducing-distributed-deep-dive-interview-series-by-ably-realtime
Architecture for microservice by Microsoft - https://docs.microsoft.com/en-us/dotnet/architecture/microservices/

Golden Rules to Remember

1. If we are dealing with a read-heavy system, it's good to consider using a Cache.

2. If we need low latency in the system, it's good to consider using a Cache & CDN.

3. If we are dealing with a write-heavy system, it's good to use a Message Queue for async processing OR Append only logs

4. If we need a system to be an ACID complaint, we should go for RDBMS or SQL Database

5. If data is unstructured & doesn't require ACID properties, we should go for NoSQL Database

6. If the system has complex data in the form of videos, images, files etc, we should go for Blob/Object storage

7. If the system requires complex/heavy pre-computation like a news feed, we should use a Message Queue & Cache

8. If the system requires searching data in high volume, we should consider using a search index, tries or a search engine like Elasticsearch

9. If the system requires to Scale SQL Database, we should consider using Database Sharding & Partitioning

10. If the system requires High Availability, Performance, & Throughput, we should consider using a Load Balancer

11. If the system requires faster data delivery globally, reliability, high availability, & performance, we should consider using a CDN

12. If the system has data with nodes, edges, and relationships like friend lists, & road connections, we should consider using a Graph Database

13. If the system needs scaling of various components like servers, databases, etc, we should consider using Horizontal Scaling

14. If the system requires high-performing database queries, we should use Database Indexes

15. If the system requires bulk job processing, we should consider using Batch Processing & Message Queues

16. If the system requires reducing server load and preventing DOS attacks, we should use a Rate Limiter

17. If the system has microservices, we should consider using an API Gateway (Authentication, SSL Termination, Routing etc)

18. If the system has a single point of failure, we should implement Redundancy in that component

19. If the system needs to be fault-tolerant, & durable, we should implement Data Replication (creating multiple copies of data on different servers)

20. If the system needs user-to-user communication (bi-directional) in a fast way, we should use Websockets

21. If the system needs the ability to detect failures in a distributed system, we should implement a Heartbeat

22. If the system needs to ensure data integrity, we should use Checksum Algorithm

23. If the system needs to scale servers with add/removal of nodes efficiently, with no hotspots, we should implement Consistent Hashing

24. If the system needs to transfer data between various servers in a decentralized way, we should go for\
Gossip Protocol

25. If the system needs anything to deal with a location like maps, nearby resources, we should consider using Quadtree, Geohash etc

26. Avoid using any specific technology names such as - Kafka, S3, or EC2. Try to use more generic names like message queues, object storage etc

27. If High Availability is required in the system, it's better to mention that the system cannot have strong consistency. Eventual Consistency is possible

28. If asked how domain name query in the browser works and resolves IP addresses. Try to sketch or mention about DNS (Domain Name System)

29. If asked how to limit the huge amount of data for a network request like youtube search, trending videos etc. One way is to implement Pagination which limits response data.

30. If asked which policy you would use to evict a Cache. The preferred/asked Cache eviction policy is LRU (Least Recently Used) Cache. Prepare around its Data Structure and Implementation.

Credit: https://leetcode.com/discuss/interview-question/system-design/3616948/golden-rules-to-answer-in-a-system-design-interview

System Design Interview Approach Template

THINGS TO CONSIDER [5 min]

    (1) Features
    (2) API
    (3) Availability
    (4) Latency
    (5) Scalability
    (6) Durability
    (7) Class Diagram
    (8) Security and Privacy
    (9) Cost-effective

FEATURE EXPECTATIONS [5 min]

    (1) Use cases
    (2) Scenarios that will not be covered
    (3) Who will use
    (4) How many will use
    (5) Usage patterns

ESTIMATIONS [5 min]

    (1) Throughput (QPS for read and write queries)
    (2) Latency expected from the system (for read and write queries)
    (3) Read/Write ratio
    (4) Traffic estimates
            - Write (QPS, Volume of data)
            - Read  (QPS, Volume of data)
    (5) Storage estimates
    (6) Memory estimates
            - If we are using a cache, what is the kind of data we want to store in cache
            - How much RAM and how many machines do we need for us to achieve this ?
            - Amount of data you want to store in disk/ssd

DESIGN GOALS [5 min]

    (1) Latency and Throughput requirements
    (2) Consistency vs Availability  [Weak/strong/eventual => consistency | Failover/replication => availability]

HIGH LEVEL DESIGN [5-10 min]

    (1) APIs for Read/Write scenarios for crucial components
    (2) Database schema
    (3) Basic algorithm
    (4) High level design for Read/Write scenario

DEEP DIVE [15-20 min]

    (1) Scaling the algorithm
    (2) Scaling individual components: 
            -> Availability, Consistency and Scale story for each component
            -> Consistency and availability patterns
    #### Think about the following components, how they would fit in and how it would help
            a) DNS
            b) CDN [Push vs Pull]
            c) Load Balancers [Active-Passive, Active-Active, Layer 4, Layer 7]
            d) Reverse Proxy
            e) Application layer scaling [Microservices, Service Discovery]
            f) DB [RDBMS, NoSQL]
                    > RDBMS 
                        >> Master-slave, Master-master, Federation, Sharding, Denormalization, SQL Tuning
                    > NoSQL
                        >> Key-Value, Wide-Column, Graph, Document
                            Fast-lookups:
                            -------------
                                >>> RAM  [Bounded size] => Redis, Memcached
                                >>> AP [Unbounded size] => Cassandra, RIAK, Voldemort
                                >>> CP [Unbounded size] => HBase, MongoDB, Couchbase, DynamoDB
            g) Caches
                    > Client caching, CDN caching, Webserver caching, Database caching, Application caching, Cache @Query level, Cache @Object level
                    > Eviction policies:
                            >> Cache aside
                            >> Write through
                            >> Write behind
                            >> Refresh ahead
            h) Asynchronism
                    > Message queues
                    > Task queues
                    > Back pressure
            i) Communication
                    > TCP
                    > UDP
                    > REST
                    > RPC

JUSTIFY [5 min]

(1) Throughput of each layer
(2) Latency caused between each layer
(3) Overall latency justification

genzarchitect/system-design