Date: September 9, 2019 Author: Sylvia Tran
The work in this repository was designed for the LA R Users Group. This repository is intended for a high level overview of python for R users, data cleaning, preprocessing, modeling.
Scope
- The code provided uses Python 3.7.0
- Environment setup is not addressed as part of the scope of this repository
- The work was done on a MacOS, therefore nuances pertaining to Windows OS are not addressed
- The use of RandomForest is demonstrative, and neither intended to optimize hyperparameters nor minimize loss
- Forthcoming: R <-> Python Cheatsheet to be added to this repository in the coming weeks in the
./slides-etc/
directory
- assets (pictures and .mov files for screen capture)
- notebooks (jupyter notebook (that can be converted to a slide deck))
- slides (holds slide deck as .html)
- src (.py file as an example)
A. Interactive Python Can be accessed through RStudio using the Terminal by
- starting from the working directory of choice
$ ipython
B. Jupyter ipynb (interactive Python notebook)
- after downloading the repo, make a copy of the .ipynb file in the /notebooks folder
- take apart the code line by line, or go to town on trying different things on the play dataset
- Importing Packages
- Loading Toy Datasets (sklearn) & using pandas
- Cursory Inspection (pandas & numpy)
- Light Cleaning (base python, pandas)
- Train-test-split (sklearn)
- Feature Scaling (sklearn)
- Model (sklearn)