/data-syllabus

A repo documenting materials used in anticipation for a data science immersive course

Data Science Syllabus and Benchmarks

Documented below are various tutorials, books, and online courses related to practical data science

I originally made this with prepping for coding bootcamps in mind, but this could be used by anyone who wants to self-study or prepare for data science courses in general (degree or bootcamp).

 

1. Online Prep

These are sites with courses I have either already used or plan to use. I chose them because they take a more modular approach to how you learn a given topic - that is, you can stick to something specific rather than having to take an entire course track for the tidbits you actually need.

 

A. Codecademy Courses:

  1. Learn the Command Line

  2. Learn Git

  3. Learn SQL

  4. Python 3

  5. SQL: Analyzing Business Metrics

  6. SQL: Table Transformation

  7. Javascript

 

Learn the Command Line and Learn Git can be done in a day if I really wanted to, and I might just repeat those from time to time. All the others on the list might take days to do, but Codecademy sometimes doesn't list hourly estimates. At a bare minimum, you should do the first three (command line, git, SQL) since those cover things you can't as easily find elsewhere.

Update: In the Pro version of Codecademy, you recieve access to topic-specific tracks that cover content in a way that focuses on the following:

  1. Code Foundations

  2. Computer Science

  3. Data Science

  4. Web Development

 

I would suggest paying for a one-month pro membership and doing the first three of those tracks, possibly even the Web Development track as well if you want to understand what potential coworkers have to work on.

There's also "skill paths" that can be found in their full course list as well. Those seem geared towards mastering specific niches.

 

B. Code Wars Challenges:

Based on the advice of a Galvanize student turned instructor, I should aim to surpass several levels of python challenges on this site to reach "level 6 kata" rank or better. I don't pretend to know how necessary that will be, but it wouldn't hurt to maintain that level of proficiency.

 

C. Fast.ai:

Fast.ai is a well condensed course on deep learning that seeks to teach you as much as possible about that topic in a week. Not a bad idea to give this a spin prior to any full-time immersive course.

 

D. Kaggle Tutorials:

The homepage for these can be found here: https://www.kaggle.com/learn/overview

As of now, I'm not sure how useful the tutorials on that site will be given the fact that they appear to cover stuff I'll be learning elsewhere. But it won't hurt to see if there's anything I've missed.

Bonus: If Kaggle competitions seem interesting to you, I'm sure Numerai will be at least as interesting - if not more fulfilling. This interview with the founder of the project is a good start to get context.


2. Reading and Video Course List

I will be reading most of these titles through Packt Publishing's Online Library, since they tend to be published by Packt. Others who want access to a wider variety of publishers may want to look into Safari Online by O'Reilly Media. It costs multiple times more than using Packt's service alone, but that can be worth it depending on the variety of content you're looking for.

 

A. Pre-Course Reading List

These are by no means the only books on my radar. "Statistics Essentials for Dummies" is a good review book for instance. Given that most of my projects involve natural language processing to infer whether news articles or academic studies contain certain pieces of information or not, this list reflects that focus.

  1. Data Science from Scratch - 305, 25

  2. Principles of Data Science - 365, 13

  3. Humongous Book of Statistics Problems - 525, 18

  4. Python Data Science Essentials 3rd Ed - 450, 8

  5. SQL Practice Problems - 140, 57

  6. Statistics in a Nutshell - 425, 20

  7. Python Web Scraping Cookbook - 330, 11

  8. Statistics Done Wrong - 130, 12

  9. Dive Into Algorithms 215, 11

  10. Hands-On Automated Machine Learning - 250, 8

  11. Hands-On Data Science and Python Machine Learning - 395, 10

  12. Hands-On Data Visualization with Bokeh - 150, 8

  13. Natural Language Processing and Computational Linguistics - 280, 15

  14. Interactive Data Visualization with Python - 335, 7

  15. Automated Machine Learning - 300, 10

 

B. Pre-Course Video Tutorial List:

These courses come from Packt, they tend to reflect my focus on natural language processing. I've already noticed that it helps to learn the same subject matter in more than one format to really understand it. Below I listed how long the videos are, but adding the actual time it takes to try out what they cover usually means they take three times as long to finish.

  1. Learn Python in Three Hours - 3 hours

  2. Next Generation Natural Language Processing - 2 hours

  3. Natural Language Processing with Python - 2 hours

  4. Mastering Natural Language Processing with Python - 2 hours

  5. Working with Big Data in Python - 3 hours

  6. Ensemble Machine Learning Techniques - 3 hours