/system_design

Preparation links and resources for system design questions

SYSTEM DESIGN PREPARATION

  • How to prepare for and answer system design questions

Objective

Learning about and implementing large-scale distributed system is not easy. I do not want to give the impression that it's something that can be learnt in a month. What this repository aims to achieve, is for software engineers and students to get a rough idea of how the thought process of designing a large scale works and how big companies have managed to solve really hard problems. Along with that, there is a recent trend for companies to have an open-ended interview with system design questions, which is at times hard for engineers of all levels if they haven't gotten the opportunity to work on such systems themselves.

This is a collection of links/documents for the following use cases: a) Prepare for a system design or open-ended rounds. b) Learn more about how large-scale systems work and thought process of designing a new system.

Index

For a very broad overview please go through these lectures, really useful:

These talks should give you a starting point on how to think about such problems.

But before you begin, here are some topics(in no particular order) which in my opinion you should have a decent idea of before proceeding.

  1. Operating system basics: how a file system, virtual memory, paging, instruction execution cycle etc work (For starters silbershatz should be enough. If you already have decent knowledge try stallings book on OS)
  2. Networking basics: Should know the TCP/IP stack, basics of how Internet, HTTP, TCP/IP work at the minimum. cs75 on youtube (1st lecture) should give a broad overview. I personally love networking-a top down approach.
  3. Concurrency basics: threads, processes, threading in the language you know. Locks , mutex etc.
  4. DB basics: types of DB's (SQL vs noSQL etc ), hashing and indexing, EAV based databases, Sharding, caching for databases, master-slave etc
  5. A basic idea of how a basic web architecture is: say load balancers, proxy, servers, Database servers, caching servers, precompute, logging big data etc. Just know broadly what is each layer for.
  6. very basic summary of what the CAP theorem is (Have never been asked about the theorem itself, but knowing it will help you in designing large-scale systems.
  • I found hiredintech videos an excellent place to start with. The way how to approach a design question as given in the link is really useful. It goes into how we start with clearing the use-cases of the system, then thinking in the abstract manner of the various component and the interactions. Think about the bottlenecks of the system and what is more critical for your system (eg latency vs reliability vs uptime etc) Address those giving the tradeoff of your approach.

  • system design in crack the coding interview: good approach on how to begin attacking a problem by first solving for a small usecase then expanding the system.

  • The best way to prepare for such questions is do mock interviews, pick any topic (given below) try to come up with a design and then go and see how and why it is designed in that manner. There is absolutely no alternative to practice!! Whiteboarding a system design question is similar to actually writing code and testing it! Just reading will only take you so far.

These are the steps I go through mentally in the interviews, followed by actual interview experiences:

  • a) Be absolutely sure you understand the problem being asked, clarify on the onset rather than assuming anything
  • b) Use-cases. This is critical, you MUST know what is the system going to be used for, what is the scale it is going to be used for. Also, constraints like requests per second, requests types, data written per second, data read per second.
  • c) Solve the problem for a very small set, say, 100 users. This will broadly help you figure out the data structures, components, abstract design of the overall model.
  • d) Write down the various components figured out so far and how will they interact with each other.
  • e) As a rule of thumb remember at least these :
    1. processing and servers
    1. storage
    1. caching
    1. concurrency and communication
    1. security
    1. load balancing and proxy
    1. CDN
    1. Monetization: if relevant, how will you monetize? eg. What kind of DB (Is Postgres enough, if not why?), do you need caching and how much, is security a prime concern?
  • f) Special cases for the question asked. Say designing a system for storing thumbnails, will a file system be enough? What if you have to scale for facebook or google? Will a nosql based database work?
  • g) After I have my components in place, what I generally try to do is look for minor optimization in various places according to the use-cases, various tradeoffs that will help in better scaling in 99% cases.
  • h) [Scaling out or up] (http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html)
  • i) Check with the interviewer is there any other special case he is looking to solve? Also, it really helps if you know about the company you are interviewing with, what its architecture is, what will the interviewer have more interest in based on the company and what he works on?

It generally depends what you are and you will be working on. Also what your level is but these are some of the more frequent interview questions.

  • Design amazon's frequently viewed product page (eg. which shows the last 5 items you saw)
  • Design an online poker game for multiplayer. Solve for persistence, concurrency, scale. Draw the ER diagram for this
  • Design a [url compression system] (http://www.hiredintech.com/system-design/the-system-design-process/)
  • Search engine (generally asked with people who have some domain knowledge): basic crawling, collection, hashing etc. Depends on your expertise on this topic
  • Design dropbox's architecture. good talk on this
  • Design a picture sharing website. How will you store thumbnails, photos? Usage of CDNS? caching at various layers etc.
    • Design a news feed (eg. Facebook , Twitter): news feed
  • Design a product based on maps, eg hotel / ATM finder given a location.
  • Design malloc, free and garbage collection system. What data structures to use? decorator pattern over malloc etc.
  • Design a site like junglee.com i.e price comparision, availability on e-commerce websites. When and will you cache, how much to query, how to crawl efficiently over e-commerce sites, sharding of databases, basic database design
  • A web application for instant messaging, eg whatsapp, facebook chat. Issues of each, scaling problems, status and availability notification etc.
  • Design a system for collaborating over a document simultaneously (eg google docs)
  • (very common:) top 'n' or most frequent items of a running stream of data
  • Design election commission architecture : Let's say we work with the Election Commission. On Counting day, we want to collate the votes received at the lakhs of voting booths all over the country. Each booth has a voting machine, which, when connected to the network, returns an array of the form {[party_id, num_votes],[party_id_2, num_votes_2],...}. We want to collect these and get the current scores in real time. The report we need continuously is how many seats is each party leading in. Please design a system for this.
  • Design a logging system (For web applications, it is common to have a large number of servers running the same application, with a load balancer in front to distribute the incoming requests. In this scenario, we want to check and alarm in case an exception is thrown in any of the servers. We want a system that checks for the appearance of specific words, "Exception", "Disk Full" etc. in the logs of any of the servers. How would you design this system?)

Personally I looked into the following architectures:

courtesy checkcheckzz

Depending on where you are interviewing, go through the company blog. VERY USEFUL IN INTERVIEWS! It really helps if you have an idea of the architecture, as the questions asked will generally be of that domain and your prior knowledge will help out here.

I would HIGHLY recommend you do not take a shortcut unless you have a week or so for an interview. System design is best learnt by practising, shortcuts might help you in the short term, but would recommend coming back to this link for an in-depth understanding after the interview

  • a) Go through cs76 and Udacity's links given above for scaling systems.
  • b) Go through the engineering blog of the company you are interviewing in (or if its a startup go through the link of the company closest to yours)
  • c) See this talk: http://www.hiredintech.com/system-design/the-system-design-process/ and develop a process for how to answer such questions.
  • d) Remember these terms, just roll over them in your interview in your mind, and if relevant mention it in the interview
  1. processing and servers
  2. storage
  3. caching
  4. concurrency and communication
  5. security
  6. load balancing and proxy
  7. CDN
  8. Monetization

Best of luck 👍, feel free to send pull requests to add more content to this git!