/Bootcamp2019

Central repository for 2019 Summer Data Science Bootcamp instructional materials.

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Bootcamp Logo


Repository Overview & Organization

This repository is intended to centralize the collaborative development of educational materials for the 2019 Data Science Bootcamp by both instructors and teaching assistants.

Bootcamp Schedule

Monday Tuesday Wednesday Thursday
9:30 - 9:40 Welcome
9:40 - 10:30 Lecture Ryan Wade - Python & GitHub Vetria Byrd - Python for Data Science Edmond Chow - Regression, Discriminants, etc. Xiaoming Huo - Clustering & Classification
10:30 - 10:50 Break
10:50 - 11:40 Lecture Ryan Wade - Python & Github Vetria Byrd - Python for Data Science Edmond Chow - Regression, Discriminants, etc. Xiaoming Huo - Clustering & Classification
11:40 - 1:15 Lunch
1:15 - 2:30 Lab Dominic Sirianni - GitHub Planets Benjamin Comer - Library Basics Ray Lei - SciKitLearn Basics Ray Lei - Clustering
2:30 - 2:45 Break
2:45 - 4:00 Lab Benjamin Comer - Project Euler Dominic Sirianni - Ecology Data Carpentry Derek Metcalf - TensorFlow Basics Derek Metcalf - Classification
Friday
9:30 - 9:40 Welcome
9:40 - 10:20 Lecture Vetria Byrd - The ubiquitous nature of data visualization
10:20 - 11:00 Lecture David Sherrill - Machine learning for predicting drug binding
11:00 - 11:20 Break
11:20 - 11:50 Lecture Chris DePree - The NASA exoplanet dataset
11:50 Adjourn

Bootcamp Description

Data science is revolutionizing how scientists and engineers go about their work, but most students have not had much exposure to it. This one-week bootcamp provides an opportunity to get introduced to data management and visualization, data modeling, deep learning, and scientific programming in Python. The bootcamp will consist of morning lectures, followed by hands-on sessions in the afternoon to try out and practice concepts and software tools.

The bootcamp is aimed at undergraduate and graduate students in science and engineering who have an introductory-level familiarity with any computer programming language, or MATLAB, or RStudio, etc. The bootcamp is free of charge, but enrollment is capped so students must apply by May 15, 2019. Students from Agnes Scott, Morehouse, Spelman, and Georgia Tech are particularly encouraged to apply.

  • Topics:
    • Computer programming in Python for data science, clustering, numerical linear algebra, classification, regression, deep learning, and domain applications.
  • Tools:
    • Python, Jupyter notebooks, GitHub, NumPy, Pandas, Matplotlib, scikit-learn, and TensorFlow libraries
  • Skills:
    • Python programming, version control, social coding, data handling and visualization, data analysis, data modeling and prediction, and scientific and engineering applications
  • Instructors:
    • Ryan Wade (Blue Horseshoe Solutions), Vetria Byrd (Purdue University), Edmond Chow (Georgia Tech), Xiaoming Huo (Georgia Tech), Chris DePree (Agnes Scott), and David Sherrill (Georgia Tech)
  • Location: Georgia Tech Campus (Visitor parking available in the W23 Parking Lot, located at 911 State St. NW.)
    • Monday: Engineered Biosystems Building (EBB), Children's Healthcare Seminar Room (first floor by food kiosk), 950 Atlantic Dr., Atlanta GA 30332
    • Tuesday–Friday: Molecular Science and Engineering Building (MoSE), Room G011 (ground floor behind elevators), 901 Atlantic Dr., Atlanta, GA 30332

This bootcamp is sponsored by a National Science Foundation TRIPODS+X: EDU grant to the Data-Driven Alliance (Agnes Scott, Georgia Tech, Morehouse, and Spelman) and the Institute for Data Engineering and Science (IDEaS) at Georgia Tech.

Lunch Options

Restaurants near Georgia Tech

Connecting to Virtual Machines

Once you have generated your data science virtual machine (DSVM), follow these steps to launch it:

Connecting on Windows

Once the DSVM is done deploying:

  1. click "go to resource."
  2. From there click "connect" on the top right of the resource window
  3. click "Download RDP File."
  4. Open this file and enter the username and password you made when you created the virtual machine.

Connecting on Mac OSX

Once the DSVM had deployed:

  1. Launch the Microsoft remote desktop client (RDC).
  2. If this is the first time connecting to the DSVM, add a new connection by clicking on the "New" button (big plus "+", upper left corner of window). Then, fill in the following fields in the pop-up window:
    • Connection name: (doesn't matter what you call it)
    • "PC Name": input the public IP address of the DSVM (available under the "Overview" section of the DSVM on the Azure dashboard)
    • "User name"/"Password": Username & passord for Azure account, set when the DSVM was created
  3. After filling in the above fields, close the external window. The new connection should appear under "My Desktops" in the Microsoft remote desktop client.
  4. Launch your DSVM by double-clicking the item in the list

IF YOU FOR WHATEVER REASON CANNOT GET INTO A VIRTUAL MACHINE FOLLOW THESE INSTRUCTIONS

  • First, try deleting the virtual machine you made and start over
  • ensure that when you're logging into the virtual machine, you're using the username and password you made when you were prompted to create when you made your virtual machine
  • If these fail you'll need to work locally, the dirctions below will allow you to do this.
  • go to this address: https://www.anaconda.com/distribution/
  • choose a distribution appropriate for your operating system
  • download the file and follow the installation instructions
  • If you are on windows, this will give you the anaconda prompt, and you can follow along
  • If you are on mac, you can use your terminal. To access this, open finder and type in "terminal"
  • this will allow you to follow along on your local machine, but you may need to install some packages

Bootcamp Pre-Survey

Pre-Bootcamp Survey

Bootcamp Post-Survey

Post-Bootcamp Survey