This repository contains all lab materials for the University of Washington course Data Science for Biologists (Winter 2019, BIOL 419/519). Please feel free to use for any purpose.
Course design and lecture material (not included here) by Bingni Brunton and Kameron Harris. Lab materials by Eleanor Lutz, with helpful suggestions from Bing and Kam.
This 2019 course used default package versions downloaded with Anaconda 5.3.0
: Pandas 0.23.4
, Matplotlib 3.0.2
, Numpy 1.15.4
, and Scikit-Learn 0.20.2
.
- Python Data Science Handbook by Jake VanderPlas
- Jupyter Notebook documentation
- A gallery of interesting Jupyter Notebooks
- Markdown syntax
- A Primer on Matrices by Stephen Boyd
- 10 minute Pandas tutorial
- Matplotlib introduction to Pyplot
- Matplotlib usage guide: includes helpful anatomy of a figure
- Detailed explanation of joining dataframes in Pandas
- An introduction to machine learning with Scikit-Learn
- Scikit-Learn user guide
- Python Basics Cheat Sheet by Python for Data Science
- Jupyter Notebook Cheat Sheet by Python for Data Science
- Numpy Cheat Sheet by Python for Data Science
- Matplotlib Cheat Sheet by Python for Data Science
- Pandas Cheat Sheet by Python for Data Science
- Importing Data Cheat Sheet by Python for Data Science
- Scikit-Learn Cheat Sheet by Python for Data Science
These labs were designed for students with no prior programming experience. Many sections of the code are intentionally written inefficiently, because students had not yet learned more advanced concepts (for loops, libraries, etc). A brief description of skills and topics covered in each lab is included below. Each lab is provided as a Jupyter Notebook both with and without answers, in addition to as PDF
files (in the folders PDF_Lab_Keys
and PDF_Labs
).
- Python data types
- Conditional logic in Python
- Looping over data in Python
- Introduction to libraries
- Numpy arrays
- Importing data from a file into a Numpy array
- Examining and plotting data in a Numpy array
- Matrix algebra by hand (10-minute class exercise, not included)
- Matrix algebra in Python
- Functions
- Reading in data using the Pandas library
- Review of linear regression
- Plotting in three dimensions
- Inspecting and cleaning data in Pandas
- Working with figure objects in Matplotlib
- Joining two Pandas dataframes
- Plots with multiple subplots
- Plotting scatterpoints colored by group
- Review of importing and inspecting data
- Split a dataset into a training and test set
- Train a machine learning classifier using scikit-learn
- K-means clustering using scikit-learn
- Custom Matplotlib legends