What is this Book? Download PDF How to Contribute YouTube Twitter Amazon Shop
Contents:
- Introduction
- Basic Engineering Skills
- Advanced Engineering Skills
- Hands On Course‚
- Case Studies
- 1001 Interview Questions
- Recommended Books and Courses
Introduction
- What is this Cookbook
- Data Engineer vs Data Scientist
- My Big Data Platform Blueprint
- Who Companies Need
Basic Engineering Skills
- Learn To Code
- Get Familiar With Git
- Agile Development
- Software Engineering Culture
- Learn how a Computer Works
- Data Network Transmission
- Security and Privacy
- Linux
- Docker
- The Cloud
- Security Zone Design
Advanced Engineering Skills
- Data Science Platform
- Hadoop Platforms
- Connect
- Buffer
- Processing Frameworks
- Is ETL still relevant for Analytics?
- MapReduce
- Apache Spark
- What is the Difference to MapReduce?
- How Spark Fits to Hadoop
- Spark vs Hadoop
- Spark and Hadoop a Perfect Fit
- Spark on YARn
- My Simple Rule of Thumb
- Available Languages
- Spark Driver Executor and SparkContext
- Spark Batch vs Stream processing
- How Spark uses Data From Hadoop
- What are RDDs and How to Use Them
- SparkSQL How and Why to Use It
- What are Dataframes and How to Use Them
- Machine Learning on Spark (TensorFlow)
- MLlib
- Spark Setup
- Spark Resource Management
- AWS Lambda
- Apache Flink
- Elasticsearch
- Apache Drill
- StreamSets
- Store
- Visualize
- Machine Learning
- How to do Machine Learning in production
- Why machine learning in production is harder then you think
- Models Do Not Work Forever
- Where are The Platforms That Support Machine Learning
- Training Parameter Management
- How to Convince People That Machine Learning Works
- No Rules No Physical Models
- You Have The Data. Use It!
- Data is Stronger Than Opinions
- AWS Sagemaker
Hands On Course
- What We Want To Do
- Thoughts On Choosing A Development Environment
- A Look Into the Twitter API
- Ingesting Tweets with Apache Nifi
- Writing from Nifi to Apache Kafka
- Apache Zeppelin Data Processing
- Switch Processing from Zeppelin to Spark
Case Studies
- Data Science @Airbnb
- Data Science @Amazon
- Data Science @Baidu
- Data Science @Blackrock
- Data Science @BMW
- Data Science @Booking.com
- Data Science @CERN
- Data Science @Disney
- Data Science @DLR
- Data Science @Drivetribe
- Data Science @Dropbox
- Data Science @Ebay
- Data Science @Expedia
- Data Science @Facebook
- Data Science @Google
- Data Science @Grammarly
- Data Science @ING Fraud
- Data Science @Instagram
- Data Science @LinkedIn
- Data Science @Lyft
- Data Science @NASA
- Data Science @Netflix
- Data Science @OLX
- Data Science @OTTO
- Data Science @Paypal
- Data Science @Pinterest
- Data Science @Salesforce
- Data Science @Siemens Mindsphere
- Data Science @Slack
- Data Science @Spotify
- Data Science @Symantec
- Data Science @Tinder
- Data Science @Twitter
- Data Science @Uber
- Data Science @Upwork
- Data Science @Woot
- Data Science @Zalando
1001 Interview Questions
Recommended Books and Courses
How To Contribute
If you have some cool links or topics for the cookbook, please become a contributor.
Simply pull the repo, add your ideas and create a pull request. You can also open an issue and put your thoughts there.
Please use the "Issues" function for comments.
Support
Everything is free, but please support what you like! Join my Patreon and become a plumber yourself: Link to my Patreon
Or support me and send a message through Paypal.me: Link to my Paypal.me/feedthestream
Important Links
Subscribe to my Plumbers of Data Science YouTube channel for regular updates: Link to YouTube
Check out my blog and get updated via mail by joining my mailing list: andreaskretz.com
I have a Medium publication where you can publish your data engineer articles to reach more people: Medium publication