Resources for Analytics Engineers
This repository is a curation of good blog posts and books for Analytics Engineers. It can also be very useful for Data Analysts and Data Scientists.
Contribute
I really appreciate any contribution. Just make sure to describe the theme and why you found the resource useful.
Table of Contents
- SQL
- Python
- Infrastructure
- Analytics Skills
- Data Warehousing
- Data Pipelines
- Starting analytics in a company
- Testing data
- Success Stories
- Organisation
- Data Visualisation
- Marketing and data
- Thinking with data
- Github-Gitlab repo to learn from
- Other readings lists
- Top bloggers/blog
Readings
Definition of the Analytics Engineer: The Analytics Engineer.
SQL
SQL has a lot of tips and tricks that take times to know.
- Mode Analytics SQL Guide. Very complete, even intermediate users can learn from this series of tutorials.
- Learning SQL 201: Optimizing Queries, Regardless of Platform By Randy Au. I finally found a complete post on advanced SQL.
Python
Python is a very broad subject. Maybe you can follow this list for more Python focused readings.
- Python for Data Analysis.
📖 Very comprehensive book about using python for data stuff. - Pandas Cheatsheet I use it everyday!
- Modern pandas. A series of blog posts on intermediate/advanced pandas written by one of the maintainers.
Infrastructure
- The Startup Founder's Guide to Analytics. An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up.
- The missing layer of Analytics Stack.
- Choosing a Data Warehouse. A lot of excellent answers on what to choose for your data warehouse.
- Data science for start-ups. You can find some useful information in this free book.
- Designing Data-Intensive Applications
📖 Fascinating read to learn more about databases, protocols etc...
Comparison of tools by Stephen Levin
- Looker vs Tableau vs Mode. Data Visualisation tools compared. .
- Segment vs Fivetran vs Stitch: Which Data Ingest Should You Use?
Analytics Skills
- One analyst's guide for going from good to great
- Suceeding as the first data person in a small company/startup. A must read for anyone working in data even in a big company.
- Prioritizing data science work. Too many engineers like building ivory towers. Make sure you don't fall in the trap.
Data Warehousing
- The beginner guide to data engineering series. Start here if you don't know what is a star schema, Airflow and some basic practices when writing data pipelines.
- Best practices for data modeling. A lot of practical tips on naming, grain, permissions and materialization.
- The Data Warehouse Toolkit by Ralph Kimball.
📖 A classic in Business Intelligence. Some chapters can be gold on modeling your data warehouse. - Functional Data Engineering — a modern paradigm for batch data processing. You will learn the spirit behind good data pipelines and a well-designed data warehouse.
- The rise of the Data Engineer. Explains recent evolutions of the job and data practices.
- Five principles that will keep your data warehouse organized
- For Data Warehouse Performance, One Big Table or Star Schema?. Discussion on an alternative to star schema.
Data Pipelines
- Functional Data Engineering — a modern paradigm for batch data processing. You will learn the spirit behind good data pipelines and a well-designed data warehouse.
- Maintenable ETL: Tips for Making Your Pipelines Easier to Support and Extend. Best practices to write good ETL.
- The Data Warehouse ETL Toolkit
📖 Once again, very dense book but you can find good ideas.
Starting analytics in a company
- Building a data practice from scratch. Very useful for your first weeks as a data person.
- The Startup Founder's Guide to Analytics. An excellent introduction to the stack necessary for analytics and its evolution following the growth of the start-up.
Testing data
- Automated Testing In The Modern Data Warehouse. Practical advice to test data. Useful for everyone building data pipelines. Rare to found such a post dealing with non-sexy thing in data.
Success Stories
Organisation
- Engineer shouldn't write ETL. It's more data science focused but it's a classic.
- Does my startup data team need a data engineer?
Marketing and data
- Data Driven Marketing.
📖 Reading some chapters can help you think like a marketer with data driven approach. It's a gem. Didn't find this kind of insights elsewhere. - Introduction to Algorithmic Marketing.
📖 I found good ideas to make more data driven initiatives for marketing. Very dense though, you can pass the equations.
Thinking with data
These books/articles helped me to think better when analysing data.
- Common Data Mistakes to Avoid. Excellent summary of the most common fallacies when analyzing data. Very clear and well-explained.
- Thinking fast and slow. Learning about bias can be super useful. For instance, I didn't have the reflex to think of a base rate anytime I see a figure.
- Fooled by randomness.
📖 Nassim Taleb taught so much both professionally and personnaly. In Fooled By Randomness, you will learn about major pitfalls when dealing with data in real life. - Why you should care about the Nate Silver vs. Nassim Taleb Twitter war. Great chess players learn from high elo games. Great data people learn from debate between data experts.
- Five books every data scientist should read that are not about data science. I have not read them all yet. But these suggestions seems judicious.
Data Visualisation
- Fundamentals of Data Visualisation. Complete guide to visualisation. Free version online.
Github-Gitlab repo to learn from
I found that reading code helps to know the best practices whether it is Python or SQL.
In Python reading some taps from Singer can teach you a lot.
In dbt/SQL I like to browse a repo open-sourced by Gitlab
Other readings lists
The GitLab data team also made an excellent list. (close to mine)
Analytics Dispatch by Mode Analytics. Very comprehensive.
I really love Reading in Applied Data Science for a more data science focused view.
Knowing more about programming is an huge asset. For instance Professional Programming list is quite complete.
Top bloggers/blog
- Randy Au. You can read almost all his posts there are all very relevant for analytics engineers.
- Locally Optimistic. A blog dedicated to data in organizations.
- Tristan Handy. I also love his newsletter: Data Science Roundup.
- Dbt blog. 90% of the articles are almost must-read.
Where is the community?
- Locally Optimistic
- Reddit data engineering. ETL, Business Intelligence, Data Science channels are also good.