I get asked super often how to become a Data Engineer. That's why I decided to start this cookbook with all the topics you need to look into.
It's not only useful for beginners, professionals will definitely like the case study section.
If you look for the old PDF version it's here
- Introduction
- Basic Engineering Skills
- Advanced Engineering Skills
- Hands On Course‚
- Case Studies
- 1001 Interview Questions
- Learn To Code
- Get Familiar With Git
- Agile Development
- Software Engineering Culture
- Learn how a Computer Works
- Data Network Transmission
- Security and Privacy
- Linux
- The Cloud
- Security Zone Design
- Big Data
- Data Warehouse vs Data Lake
- Hadoop Platforms
- Docker
- REST APIs
- Databases
- Data Processing and Analytics
- Data Visualization
- What We Want To Do
- Thoughts On Choosing A Development Environment
- A Look Into the Twitter API
- Ingesting Tweets with Apache Nifi
- Writing from Nifi to Apache Kafka
- Apache Zeppelin Data Processing
- Switch Processing from Zeppelin to Spark
- Data Science @Airbnb
- Data Science @Amazon
- Data Science @Baidu
- Data Science @Blackrock
- Data Science @BMW
- Data Science @Booking.com
- Data Science @CERN
- Data Science @Disney
- Data Science @DLR
- Data Science @Drivetribe
- Data Science @Dropbox
- Data Science @Ebay
- Data Science @Expedia
- Data Science @Facebook
- Data Science @Google
- Data Science @Grammarly
- Data Science @ING Fraud
- Data Science @Instagram
- Data Science @LinkedIn
- Data Science @Lyft
- Data Science @NASA
- Data Science @Netflix
- Data Science @OLX
- Data Science @OTTO
- Data Science @Paypal
- Data Science @Pinterest
- Data Science @Salesforce
- Data Science @Siemens Mindsphere
- Data Science @Slack
- Data Science @Spotify
- Data Science @Symantec
- Data Science @Tinder
- Data Science @Twitter
- Data Science @Uber
- Data Science @Upwork
- Data Science @Woot
- Data Science @Zalando
If you have some cool links or topics for the cookbook, please become a contributor. Simply open an issue and add your links. Or pull the repo, add them and create a pull request.
Please pull only the "working-branch" branch.
This way we keep the master branch clean and I don't have to mess around resolving conflicts. You just need to change the .tex file. I'll recompile it later when I merge the branch with the master
For comments please also use the "Issues" function.
Everything is free, but please support what you like!
Join my Patreon and become a plumber yourself:
Link to my Patreon
Or support me and send a message through Paypal.me: Link to my Paypal.me/feedthestream
Subscribe to my Plumbers of data science YouTube channel for regular updates: Link to YouTube
Check out my blog and get updated via mail by joining my mailing list: andreaskretz.com
I have a Medium publication where you can publish your data engineer articles to reach more people: Medium publication