A list of resources that I've found handy in my work as a Data Scientist.
- Anaconda A managed Python distribution tailored for data analysis and data science.
- Brew (Mac) A package manager for Mac.
- Scoop (Windows) A package manager for Windows.
- Docker Containerisation engine that allows easy deployment of apps and environemnts.
- Github Desktop UI for Github.
- Sourcetree UI for Git.
- Sublime Text ($) Popular extensible text editor.
- Atom Popular open source text editor from Github. Has integration with Jupyter notebooks.
- IntelliJ IDEA ($) Popular IDE from JetBrains. Also encompasses PyCharm.
- Visual Studio ($) Popular IDE from Microsoft. Also the best way to install C dependencies.
- Iterm2 (Mac) Improved command line environment for Mac.
- cmder (Windows) Improved Command line environment for Windows.
- Sublime Text Cheatsheet (Mac)
- Sublime Text Cheatsheet (Mac)
- Installing Iterm2 + zsh + oh-my-zsh (Mac)
- Setting up git and cmder
- Gitlab Markdown Guide
- Github Markdown Cheat Sheet
- Data Science @ Reddit Subreddit and a good place to ask more general data science questions.
- Data Science @ Medium Blog and article site with a Data Science sub-section.
- Towards Data Science Medium based news and articles site.
- Kaggle Branded as the home of Data Science and the place to do data science projects. Has some good data and competitions also.
- KDNuggets General data science news site.
- Data Elixir General data science news site.
- The Machine Learning Engineer Newsletter Weekly newsletter focusing on Ethical AI & ML and Machine Learning Operations.
- Machine Learning Datasets Several links to good datasets for all aspects of ML.
- Chris Albon's site Chris Albon's site with an excellent set of tutorials on how to do cool data things in Python.
- How to Start in Data Science (Article) Article on how to get started in Data Science.
- Deep Learning Cheat Sheet (Article) Deep Learning cheat sheet containing high-level descriptions of the various functions.
- How to become a Data Scientist in 6 months (Video) Video detailing some high level advice on becoming a data scientist.
- Datacamp Cheat Sheets (Article) A collection of Python, R, Jupyter Cheat Sheets from DataCamp.
- Seven Practical Ideas for Beginner Data Scientists (Article)
- Google AI ML Research Publications
- Distributions CheatSheet Good overview of various distributions.
- Distributions Detailed overview of various data distributions.
- Beginners Guide to Data Engineering Part 1
- Beginners Guide to Data Engineering Part 2
- Beginners Guide to Data Engineering Part 3
- Comprehensive List of Data Engineering Resouces List of Data Engineering orientation and learning material.
- The Data Engineering Cookbook
- The Twelve Factor App Resource detailing 12 principles for scalable application design.
- Google AI Software Engineering Research Publications
- Organising Machine Learning Projects (Jeremy Jordan)
- Machine Learning Rules Google's guide to best practices in Machine Learning.
- Awesome Machine Learning (Github)
- Awesome Production Machine Learning (Github)
- XAI Python Package (Github)
- The Eight Principle of Etical Machine Learning
- R2D3 Visual introduction to Machine Learning.
- Machine Learning Yearning Book by Andrew Ng on technical strategy for ML + AI.
- ML for Coders Good looking course on learning ML from the perspective of a coder.
- Conputational Linear Algebra for Coders Another fast.ai course on applied Linear Algebra.
- Practical Deep Learning for Coders 7 week fast.ai course on Deep Learning for coders.
- Mathmatics for Machine Learning Book (in progress) about learning the maths behind ML
- Data Science Primer Overview of the steps in building a Machine Learning model.
- Getting Better at Machine Learning
- What is Feature Engineering?
- What is One Hot Encoding?
- Introduction to K-Nearest Neighbours Good intro to Knn
- Practical Guide to SVM Classification
- Understanding the SVM Kernel Trick
- Visualising Multivariate Data
- WTF is the bias/variance tradeoff?
- Heteroscedasticity Explained
- Understanding Regression Error
- Intro to Random Forests in Python
- Python Date / Time Formats Reference site for Python date / time formats
- Pandas Frequency Codes Reference for Pandas frequency codes
- Pandas Time Series / Date Functionality
- Pandas Date Offset
- Pandas Timestamp Reference
- Pandas date_range Reference
- Pandas Timedelta Reference
- Pandas Date Offset Reference
- Python datetime Reference
- What is Autocorrelation
- Autocorrelation Plot Explanation of an Autocorrelation plot.
- Autocorrelation in Python Datacamp slide deck of autocorrelation.
- A Gentle Introduction to Exponential Smoothing
- Introduction to Simple Exponential Smoothing (SES)
- Holt Winters Forecasting for Dummies
- Autoregression Models for Time Series Forecasting With Python
- Time Series Analysis in Python
- A Gentle Introduction to Autocorrelation and Partial Autocorrelation
- ARIMA models in Python (Article)
- What a p-value tells you about Statistical Data
- A Gentle Introduction to Box-Jenkins Models
- Time Series Forcasting with Naive Bayes Good step by step guide to time series forecasting with Naive Bayes and a great example notebook.
- Facebook Prophet Explained Excellent article explaining more about Prophet Hyperparemeter optimisation.
- Regex Tester & Debugger Great tool for creating and debugging Regular Expressions.
- Text Classification Google's guide to text classification.
- Mapshaper Online editor for map data. Allows you to take most GIS data formats and visualise / simplify them.
- Hitchhiker's Guide to Python A daily usage "best practice handbook" for Python. The section on code structure and style is particularly good.
- strftime.org Excellent resource for time formats in Python & Pandas
- Parallel Processing in Python Tutorial on multithreading and multiprocessing.
- Python Code Tips Good online resource with sections on generators, debigging, map/filter/reduce, and more.
- Unit-tests: Mocking, Monkey Patching & Faking Functionality Excellent introduction to some more advanced unit-test concepts using pytest.
- Awesome Flask List
- Miguel Grinberg's Flask Blog Lots of tutorials on various Flask features.
- Creating a REST API with Flask How to create a REST API with Flask.
- Auto Generating Requirements.txt How to auto-generate requirements.txt using pip freeze
- A Template for a good README Good template to follow when creating a README.md.
- How to make a Python Package Great guide on how to make a Python package.
- Complete Python Bootcamp Really good introduction to Python using Jupyter Notebooks.
- Python for Data Analysis & Visualisation Excellent intro to the data analyis libraries in Python.
- Python for Data Science & Machine Learning Very good introduction to Python for Data Science.
- Python for Analysts Tom's Python for analysts training course. This probably needs updating.
- Exploratory Data Analysis in Python (EBook) Excellent book on statistical data analysis in Python.
- Python Graph Gallery (Website) Gallery of lots of Python charts complete with source code.
- Using Jupyter Notebooks in Virtualenv
- Flask for Web Development Book detailing end to end web development with Flask.
- Datashader with Spark Blog detailing an example of how to use Datashader with big data.
- JupyterLab Extensions How to install JupyterLab extensions.
- Jupyter Shortcuts List of shortcuts for Jupyter notebooks.
- 28 Jupyter Tips Some cool tricks in Jupyter notebooks.
- Running Dask with SciKitLearn Jupyter Notebook showing how you can speed up SKL Grid/Random Searches using Dask.
- Testing with NumPy and Pandas
- Split / Apply / Combine
- R for Data Science A good starting point for learning modern R with the Tidyverse style code.
- Advanced R Useful when you want to understand the underlying processes of R and write more advanced code (for example, object orientated programming in R)
- R packages A good reference book when structuring your code as a package (what files go where + how to handle package imports etc)
- R Graph Gallery Gallery of lots of R charts complete with source code.
- Scala Docs Getting started guide.
- Scala Exercises Beginner level tutorial.
- Setting up Databricks on AWS
- Just Enough Scala for Spark Introduction to Scala for interfacing with Spark DataFrame and RDD APIs.
- Connecting to Databricks with databricks-connect
- Complete SQL Bootcamp (Udemy Course) Intro to SQL and PostgreSQL.
- Setting up Ubuntu for Windows
- Getting Started with Anaconda and Docker
- Running a Dockerized Jupyter Server for Data Science
- MS VM Images Images of windows machines to allow testing on IE. Note you'll need to install Virtualbox first via
brew cask install virtualbox
- The Complete Web Developer Course Excellent intro to HTML/CSS/JS as well as packages such as Bootstrap & JQuery.
- HTML Elements Reference Complete list of all HTML elements.
- List of Special Characters Codes and decodes for special characters in HTML.
- #Javascript30 Vanilla JS tutorial where you learn by building websites.
- Git a Web Developer Job Online course that makes the jump from writing HTML/CSS/JS to writing modern web applications using dev tools like git, webpack and babel.
- Guide to 'this' Great guide on how to use
this
in JS. - Setting up React, Webpack and Babel How to set up your development environment using React, Webpack and Babel.
- How to set up Webpack 4 How to set up Webpack 4.0.
- Switching from Gulp to Webpack
- End to End Testing
- D3 Is not a Data Visualisation Library Excellent intro to D3.
- How to React
- Intro to React Excellent intro to React incorporating props, state and JSX.
- Redux Tutorial Very good intro to Redux and React-redux.
- Github - Resources to learn Git List of good resources to learn git.
- Github learning lab Interactive tutorial that covers the basics of GitHub.
- Learn git branching The GitHub guide covers the basics more clearly, but the later chapters of this resource is good for learning how to get out of more complex git puzzles.
- Getting Git A comprehensive video course from git init to Git Master ($30)
- Lots of interactive courses on containerisation (e.g. Docker, Kubernetes) available at Katacoda
- DataVizCatalogue Overview of various chart types and alongside some links to code (usually in Javascript)
- Canva ($) Design web-app aimed at non-designers. Has a good free tier.
- Material Palette Generates nice colours for an app or design
- Cooloors Site that helps you generate good color palettes
- Viz Palette Site that helps you make nice colour palettes
- Unsplash Royalty free high-res photos
- TheStocks Royalty free photos
- ISORepublic Royalty free photos
- Pixabay Royalty free photos
- Subtle Patterns Tiled Patterns for sites & presentations
- FlatIcon Lots of free downloadable icons.
- Dribbble Nice site for inspiration on designs
- SiteInspire Showcase of good, responsive website design
- MediaQueries Showcase of good, responsive website design
- Random User Data Generates random user data through an API
- UCI ML Data Hundreds of datasets well suited to applying ML to.
- Condensed GCP Reference Excellent condensed reference for GCP.
- GCP Products Cheat Sheet A cheatsheet for GCP Products.
- GCP Regions List A cheatsheet for GCP region decodes.
- Creating & Managing Cloud SQL Instances Guide on how to create and manage GCP Cloud SQL Instances.
- Create a Cloud SQL User Guide on how to create aa Cloud SQL user.
- GCP Cloud SQL Proxy Guide on the GCP Cloud SQL Proxy.
- Connecting to Cloud SQL from External Apps How to connect your apps to Cloud SQL.
- Install and run a Jupyter notebook on a Cloud Dataproc cluster