General information about RCS Python Workshops can be found in the Python Workshops Repository. This includes information about software installations and general Python resources.
Please install Anaconda as it comes with everything that we need in this workshop. We will work from Jupyter Notebook!
Python Installation Instructions
You can download all of the files by clicking the green button above and choosing "Download ZIP."
If you download files from the links above, you have to click through to the RAW version of the notebook and download that. If you download directly from the links above, the files won't open because they are web pages, not the raw files.
On a Mac, to open the files in Jupyter Notebook, start Jupyter Notebook from the folder where you saved the files. On Windows, navigate to the directory within Jupyter Notebook.
To know how to use the main algorithms needed for predictive modeling with python/scikit-learn.
- How do I use ScikitLearn to do classification?
- How do I ensure that I'm building a model that will generalize to unseen data?
- How could ScikitLearn help me compare classifiers?
- Can I estimate how well my model is likely to perform on unseen data?
To achieve the objectives and get the above outcomes we divided the material into 5 sections. In each section we introduce the concepts, explain how to use them in scikit-learn and practice what we learned.
General scikit-learn resources and more specific tutorials that cover multiple topics can be found on Scikit-Learn Website.
Additional Predictive Modeling-specific resources include:
Data Science Central - A great online group of data science enthusiasts where you can find everything related to machine learning, predictive modeling, data science and more.
KDnuggets - A great source of news anything ML and Data Science.
Coursera, edx, udacity courses. I would strongly recommend Andrew Ng's machine learning courses.
Kaggle should be your home for data science, the most well known for the data science competitions organized regularly.
For preparing notebooks in this workshop I used notebooks from Scikit-Learn official Tutorials and workshops and from the following resources:
https://github.com/ogrisel/scipy-2018-sklearn
https://scikit-learn.org/stable/modules/impute.html
https://www.geeksforgeeks.org/regression-classification-supervised-machine-learning/
https://ekababisong.org/gcp-ml-seminar/scikit-learn/
https://www.analyticsvidhya.com/blog/2016/07/practical-guide-data-preprocessing-python-scikit-learn/
http://www.biostat.washington.edu/~dwitten/
https://www.quora.com/Which-machine-algorithms-require-data-scaling-normalization
Interesting sources a colleague found when running this workshop:
-
Traning-Testing split blog which is part of Towards Data Science Articles
-
Eamon Keogh - place to start your search related to time series research!