- The AI Hierarchy of Needs
- The Rise of Data Engineer
- The Downfall of the Data Engineer
- A Beginner’s Guide to Data Engineering
- Functional Data Engineering — a modern paradigm for batch data processing
- How to become a Data Engineer (in Russian)
- Data Engineering Principles - Build frameworks not pipelines by Gatis Seja
- Functional Data Engineering - A Set of Best Practices by Maxime Beauchemin
- Advanced Data Engineering Patterns with Apache Airflow by Maxime Beauchemin
- Creating a Data Engineering Culture by Jesse Anderson
- Algorithmic Toolbox in Russian
- Data Structures in Russian
- Data Structures & Algorithms Specialization on Coursera
- Algorithms Specialization from Stanford on Coursera
- Comprehensive SQL Tutorial by Mode Analytics
- SQL Practice on Leetcode
- Modern SQL a website about modern SQL syntax
- Scala School by Twitter
- Fluent Python intermediate level book about Python
- Intro to Scala in Russian on Stepik by Tinkoff Bank
- The Hitchhiker’s Guide to Python by Kenneth Reitz & Tanya Schlusser
- Intro to Database Systems by Carnegie Mellon University
- Advanced Database Systems by Carnegie Mellon University
- On Disk IO
- Distributed systems for fun and profit by Mikito Takada
- Distributed Systems by by Maarten van Steen & Andrew S. Tanenbaum
- CS 436: Distributed Computer Systems by University of Waterloo
- Distributed consensus reading list maintained by Heidi Howard from University of Cambridge
- Design Data-Intensive Applications by Martin Kleppmann
- Introduction to Algorithms by Thomas Cormen
- The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
- Star Schema The Complete Reference
- Database Internals: A Deep Dive into How Distributed Data Systems Work
- Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
- A Philosophy of Software Design
- Big Data for Data Engineers Specialization by Yandex
- Data Engineering on Google Cloud Platform Specialization by Google
- Data Engineer Nanodegree by Udacity
- Martin Kleppmann author of Designing Data-Intensive Application
- BaseDS by Vaidehi Joshi about Distributed Systems
- Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
- Apache Spark is a unified analytics engine for large-scale data processing
- Apache Kafka is a distributed streaming platform
- Luigi is a Python package that helps you build complex pipelines of batch jobs.
- Dagster.io is a system for building modern data applications.
- Prefect includes everything you need to create and run data applications.
- Data Eng Weekly - Your weekly Data Engineering news
- SF Data Weekly - A weekly email of useful links for people interested in building data platforms
- Data Elixir - Data Elixir is an email newsletter that keeps you on top of the tools and trends in Data Science.