This is the handout and homework repository for the course "ME 597 Data Analytics for Scientists and Engineers" which is currently taught (Spring 2021) by Prof. Ilias Bilionis at Purdue University. The course is fully online with the videos being accessible through EdX.
This course evolved from the ME 597/MA 598 "Introduction to Uncertainty Quantification" taught three times by Prof. Bilionis (the first time, Spring 2016 it was co-taught with Prof. Guang Lin). If you are interested in accessing the old versions of the course, they can be found here. Note, that there is also a, 1-credit, undergraduate version of the course under the name ME 297 "Introduction to Data Science for Mechanical Engineers." This version can be found here. Also, note that the course is about to obtain a permanent number in Spring 2022 and be renamed to "Introduction to Scientific Machine Learning."
The material is published under the GNU General Public License. You can reuse it in your own courses as soon as you also include the same License and cite this repository. Please send me an email if you do as I would love to know!
- Python Basics Activity 1 (Python as a calculator)
- Python Basics Activity 2 (Python variables and types)
- Python Basics Activity 3 (Basic tuples and lists)
- Python Basics Activity 4 (Basics of strings and printing)
- Python Basics Activity 5 (Conditionals)
- Python Basics Activity 6 (Loops)
- Python Basics Activity 7 (Functions)
- Python Basics Activity 8 (Classes)
- Python Basics Activity 9 (Numpy arrays and linear algebra)
- Python Basics Activity 10 (The Python data analysis library)
- Python Basics Activity 11 (Basic plotting)
Below, I provide links that open up directly on Google Colab. If you want to view the Jupyter notebooks locally, please see the section named Running the notebooks on your personal computer.
-
Lecture 1 - Introduction to Predictive Modeling
-
Lecture 2 - Basics of Probability Theory
-
Lecture 3 - Discrete Random Variables
-
Lecture 4 - Continuous Random variables
-
Lecture 5 - Collections of Random variables
-
Lecture 6 - Random Vectors
-
Lecture 7 - Basic sampling
-
Lecture 8 - The Monte Carlo Method for Estimating Expectations
-
Lecture 9 - Monte Carlo Estimates of Various Statistics
- Hands-on Activity 9.1 (Estimating the cumulative distribution function)
- Hands-on Activity 9.2 (Estimating the probability density function via histograms)
- Hands-on Activity 9.3 (Estimating predictive quantiles)
- Hands-on Activity 9.4 (Application – Uncertainty propagation through an initial value problem)
-
Lecture 10 - Quantify Uncertainty in Monte Carlo Estimates
-
Lecture 11 - Selecting Prior Information
-
Lecture 12 - Analytical Examples of Bayesian Inference
-
Lecture 13 - Linear Regression via Least Squares
-
Lecture 14 - Bayesian Linear regression
- Reading Activity 14
- Hands-on Activity 14.1 (Probabilistic interpretation of least squares – Estimating the measurement noise)
- Hands-on Activity 14.2 (Maximum a posteriori estimate – Avoiding overfitting)
- Hands-on Activity 14.3 (Bayesian linear regression)
- Hands-on Activity 14.4 (The point-predictive distribution – Separating epistemic and aleatory uncertainty)
-
Lecture 15 - Advanced Topics in Bayesian Linear regression
-
Lecture 16 - Classification
- Reading Activity 16
- Hands-on Activity 16.1 (Logistic regression with a single variable)
- Hands-on Activity 16.2 (Logistic regression with many features)
- Hands-on Activity 16.3 (Making decisions)
- Hands-on Activity 16.4 (Diagnostics for classification)
- Hands-on Activity 16.5 (Multi-class logistic regression)
-
Lecture 17 - Clustering and Density Estimation
-
Lecture 18 - Dimensionality Reduction
-
Lecture 19 - State Space Models - Filtering Basics
-
Lecture 20 - State Space Models - Kalman Filters
-
Lecture 21 - Gaussian Process Regression - Priors on Function Spaces
-
Lecture 22 - Gaussian Process Regression - Conditioning on Data
- Reading Activity 22
- Hands-on Activity 22.1 (Gaussian process regression without measurement noise)
- Hands-on Activity 22.2 (Gaussian process regression with measurement noise)
- Hands-on Activity 22.3 (Tuning the hyperparameters)
- Hands-on Activity 22.4 (Multivariate regression and automatic relevance determination)
-
Lecture 23 - Bayesian Global Optimization
- Reading Activity 23
- Hands-on Activity 23.1 (Maximum mean: A very bad information acquisition function)
- Hands-on Activity 23.2 (Maximum upper interval)
- Hands-on Activity 23.3 (Probability of improvement – No observation noise)
- Hands-on Activity 23.4 (Expected improvement – No observation noise)
- Hands-on Activity 23.5 (Expected improvement – With observation noise)
- Hands-on Activity 23.6 (Quantifying epistemic uncertainty in the location and value of maximum)
-
Lecture 24 - Deep Neural Networks
-
Lecture 25 - Deep Neural Networks Continued
-
Lecture 26 - Physics-informed Deep Neural Networks
-
Lecture 27 - Sampling Methods
- Reading Activity 27
- Hands-on Activity 27.1 (Introduction to probabilistic programming)
- Hands-on Activity 27.2 (Example: Sampling from the Exponential with random walk Metropolis)
- Hands-on Activity 27.3 (The Metropolis-Hastings algorithm)
- Hands-on Activity 27.4 (Gibbs sampling)
- Hands-on Activity 27.5 (Sequential Monte Carlo)
-
Lecture 28 - Variational Inference
- Homework 1
- Homework 2
- Homework 3
- Homework 4
- Homework 5
- Homework 6
- Homework 7
- Homework 8
- Homework 9
- Homework 10
Make sure you have a Google account before you start. Then, you just click on the links above.
One solution is to "print" your notebook to a PDF. However, we have observed that sometimes the figures get a bit messed up. One solution is to run the notebooks on your own laptop, and the do "File-> Download as-> PDF via Latex (.pdf)." See below if you want to take that route. Now, it is possible to do the same thing on Google Colab. Follow the instructions in this notebook.
Find and download the right version of Anaconda for Python 3.7 from Continuum Analytics. This package contains most of the software we are going to need. Note: You do need Python 3 and note Python 2. The notebooks will not work with Python 2.
- We need C, C++, Fortran compilers, as well as the Python sources. Start the command line by opening "Anaconda Prompt" from the start menu. In the command line type:
conda config --append channels https://repo.continuum.io/pkgs/free
conda install mingw libpython
- Finally, you need git. As you install it, make sure to indicate that you want to use "Git from the command line and also from 3rd party software".
- Download and install the latest version of Xcode.
If you are using Linux, I am sure that you can figure it out on your own.
Independently of the operating system, use the command line to install the following Python packages:
- Seaborn, for beautiful graphics:
conda install seaborn
- PyMC3 for MCMC sampling:
conda install pymc3
- GPy for Gaussian process regression:
pip install GPy
- pydoe for generating experimental designs:
pip install pydoe
- fipy for solving partial differential equations using the finite volume method:
pip install fipy
*** Windows Users ***
You may receive the error
ModuleNotFoundError: No module named 'future'
If so, please install future and then install fipy:
pip install future
- scikit-learn for some standard machine learning algorithms implemented in Python:
conda install scikit-learn
- graphviz for visualizing probabilistic graphical models:
pip install graphviz
- Open the command line.
cd
to your favorite folder.- Then, type:
git clone https://github.com/PredictiveScienceLab/data-analytics-se.git
- This will download the contents of this repository in a folder called
data-analytics-se
. - Enter the
data-analytics-se
folder:
cd data-analytics-se
- Start the jupyter notebook by typing the command:
jupyter notebook
- Use the browser to navigate the course, experiment with code etc.
- If the course content has been updated, type the following command (while being inside
data-analytics-se
) to get the latest version:
git pull origin master
Keep in mind, that if you have made local changes to the repository, you may have to commit them before moving on.