This repository is a collection of resources and explanations covering various aspects of system design. Whether you're a beginner or an experienced developer, this repository aims to provide insights into fundamental system design concepts. The content is split by topics and examples are provided in each section.
- What is System Design?
- Horizontal vs. Vertical Scaling
- What is Capacity Estimation?
- What is HTTP?
- What is the Internet TCP/IP stack?
- What happens when you enter Google.com?
- What are Relational Databases?
- What are Database Indexes?
- What are NoSQL databases?
- What is a Cache?
- What is Thrashing?
- What are Threads?
- What are Bloom Filters?
- What is Data Replication?
- How are NoSQL databases optimized?
- What are Location-based Databases?
- Database Migrations
- What is a Message Queue?
- What is the publisher-subscriber model?
- What are event-driven systems?
- Database as a Message Queue
- What is a Single Point of Failure?
- What are Containers?
- What is Service Discovery and Heartbeats?
- How to avoid Cascading Failures?
- Anomaly Detection in Distributed Systems
- Distributed Rate Limiting
- What is Distributed Caching?
- What are Content Delivery Networks?
- Write Policies
- Replacement Policies
- Pull vs. Push
- Memory vs. Latency
- Throughput vs. Latency
- Consistency vs. Availability
- Latency vs. Accuracy
- SQL vs. NoSQL databases
System design involves creating the architecture of a complex software system to meet specified requirements.
Horizontal scaling involves adding more machines, while vertical scaling involves increasing the resources of a single machine.
Capacity estimation is the process of predicting the amount of load a system can handle.
HTTP (Hypertext Transfer Protocol) is the foundation of data communication on the World Wide Web.
The TCP/IP stack is the suite of communication protocols that enable network connectivity on the Internet.
Explains the steps and processes that occur when a user enters a website like Google.com.
Relational databases organize data into tables and use SQL for querying and managing data.
Database indexes improve the speed of data retrieval operations on a database.
NoSQL databases are non-relational databases designed for scalability, flexibility, and performance.
A cache is a high-speed data storage layer that stores frequently accessed data.
Thrashing occurs when a computer's performance deteriorates due to excessive paging.
Threads are lightweight processes within a program that can run concurrently.
Load balancing distributes incoming network traffic across multiple servers to ensure optimal resource utilization.
Consistent hashing is a technique for distributing data across a network in a way that minimizes reorganization when nodes are added or removed.
Sharding involves dividing a database into smaller, more manageable pieces called shards.
Bloom filters are space-efficient probabilistic data structures used to test whether an element is a member of a set.
Data replication involves copying data to multiple locations to improve reliability and fault tolerance.
NoSQL databases are optimized for specific use cases, such as horizontal scaling and flexible data models.
Location-based databases store and retrieve data based on geographic locations.
Database migrations involve transferring data from one database to another while preserving data integrity.
Data consistency ensures that data remains accurate and unchanged across the system.
Describes different levels of data consistency, such as strong consistency and eventual consistency.
Transaction isolation levels define the degree to which transactions are isolated from each other.
A message queue is a communication method that allows applications to communicate asynchronously.
The publisher-subscriber model involves communication between publishers and subscribers through a message broker.
Event-driven systems respond to and handle events, triggering actions based on specific occurrences.
Using a database as a message queue for communication between different components.
A single point of failure is a component that, if it fails, will cause the entire system to fail.
Containers encapsulate applications and their dependencies, providing a consistent and isolated environment.
Service discovery involves automatically finding and connecting to services, and heartbeats are signals indicating the health of a service.
Strategies to prevent the spread of failures across a system.
Detecting abnormal behavior or performance in distributed systems.
Implementing rate limiting across multiple components in a distributed system.
Distributed caching involves caching data across multiple nodes to improve performance.
Content Delivery Networks (CDNs) distribute content geographically for faster and more reliable delivery.
Write policies determine how data is written to a cache.
Replacement policies determine which items are removed from a cache when space is needed.
Comparison between microservices architecture and monolithic architecture.
Strategies for migrating from monolithic to microservices architecture.
Design principles and considerations for creating effective APIs.
Asynchronous APIs allow communication between components without waiting for an immediate response.
OAuth is an authorization framework for securing access to resources.
Authentication using tokens for secure access to resources.
Access control lists and rule engines manage permissions and access in a system.
Choosing between pull (request-based) and push (notification-based) communication.
Balancing the tradeoff between memory usage and system responsiveness.
Optimizing for either high throughput or low latency, depending on system requirements.
Navigating the tradeoff between data consistency and system availability.
Balancing the tradeoff between response time and the accuracy of results.
Comparing the characteristics and use cases of SQL and NoSQL databases.