/data-analytics-se

Handout repository for the course "Data Analytics for Scientists and Engineers" at Purdue University.

Primary LanguageJupyter NotebookGNU General Public License v3.0GPL-3.0

ME 597 - Data Analytics for Scientists and Engineering

This is the handout and homework repository for the course "ME 597 Data Analytics for Scientists and Engineers" which is currently taught (Spring 2021) by Prof. Ilias Bilionis at Purdue University. The course is fully online with the videos being accessible through EdX.

This course evolved from the ME 597/MA 598 "Introduction to Uncertainty Quantification" taught three times by Prof. Bilionis (the first time, Spring 2016 it was co-taught with Prof. Guang Lin). If you are interested in accessing the old versions of the course, they can be found here. Note, that there is also a, 1-credit, undergraduate version of the course under the name ME 297 "Introduction to Data Science for Mechanical Engineers." This version can be found here. Also, note that the course is about to obtain a permanent number in Spring 2022 and be renamed to "Introduction to Scientific Machine Learning."

The material is published under the GNU General Public License. You can reuse it in your own courses as soon as you also include the same License and cite this repository. Please send me an email if you do as I would love to know!

Basic Python Tutorials

Lecture Notebooks

Below, I provide links that open up directly on Google Colab. If you want to view the Jupyter notebooks locally, please see the section named Running the notebooks on your personal computer.

Homework Notebooks

Running the notebooks on Google Colab

Make sure you have a Google account before you start. Then, you just click on the links above.

Converting your homework notebooks to PDF on Google Colab

One solution is to "print" your notebook to a PDF. However, we have observed that sometimes the figures get a bit messed up. One solution is to run the notebooks on your own laptop, and the do "File-> Download as-> PDF via Latex (.pdf)." See below if you want to take that route. Now, it is possible to do the same thing on Google Colab. Follow the instructions in this notebook.

Running the notebooks on your personal computer

Find and download the right version of Anaconda for Python 3.7 from Continuum Analytics. This package contains most of the software we are going to need. Note: You do need Python 3 and note Python 2. The notebooks will not work with Python 2.

OS Specific Instructions

Microsoft Windows

  • We need C, C++, Fortran compilers, as well as the Python sources. Start the command line by opening "Anaconda Prompt" from the start menu. In the command line type:
conda config --append channels https://repo.continuum.io/pkgs/free
conda install mingw libpython
  • Finally, you need git. As you install it, make sure to indicate that you want to use "Git from the command line and also from 3rd party software".

Apple OS X

  • Download and install the latest version of Xcode.

Linux

If you are using Linux, I am sure that you can figure it out on your own.

Installation of Required Python Packages

Independently of the operating system, use the command line to install the following Python packages:

conda install seaborn
  • PyMC3 for MCMC sampling:
conda install pymc3
  • GPy for Gaussian process regression:
pip install GPy
  • pydoe for generating experimental designs:
pip install pydoe
  • fipy for solving partial differential equations using the finite volume method:
pip install fipy

*** Windows Users ***

You may receive the error

ModuleNotFoundError: No module named 'future'

If so, please install future and then install fipy:

pip install future
  • scikit-learn for some standard machine learning algorithms implemented in Python:
conda install scikit-learn
  • graphviz for visualizing probabilistic graphical models:
pip install graphviz

Running the notebooks

  • Open the command line.
  • cd to your favorite folder.
  • Then, type:
git clone https://github.com/PredictiveScienceLab/data-analytics-se.git
  • This will download the contents of this repository in a folder called data-analytics-se.
  • Enter the data-analytics-se folder:
cd data-analytics-se
  • Start the jupyter notebook by typing the command:
jupyter notebook
  • Use the browser to navigate the course, experiment with code etc.
  • If the course content has been updated, type the following command (while being inside data-analytics-se) to get the latest version:
git pull origin master

Keep in mind, that if you have made local changes to the repository, you may have to commit them before moving on.