/ESS490-590-Spr21

Spring 2021 Data Science for Earth and Planetary Systems

Primary LanguageJupyter NotebookMIT LicenseMIT

ESS490-590-Spr21

Spring 2021 Data Science for Earth and Planetary Systems

Binder

Overview

Introduction to Data Science for Earth and Planetary Systems (surface processes, natural hazards, geochemistry, physics of the Earth interior). Teaches basics in computing skills in python; data manipulations, visualization, curation; data statistics, clustering; regression analysis, neural networks; applied explicitly to data encountered in Earth and Planetary Systems.

Learning objectives

Quantitative data analysis is becoming a necessary skill for most geoscientists. The course is designed to provide hands-on experience with fundamental data science techniques applied to geoscientific data. The learning outcomes will be basics in:

  • Computing: python, notebooks, version control, cloud, and local platforms.
  • Data manipulation in Geosciences: data formats, plotting, dimensionality reduction, and feature engineering.
  • Statistical methods applied to geoscience data.
  • Open science, reproducibility, and digital scholarship Students will learn by practicing.

Prerequisites: MATH 207 and MATH 208, or MATH 307 and 308, or AMATH 351 and 352, or permission from the instructor.

Recommended: knowledge in Matlab or python, AMATH301, a college-level Earth Sciences course

Assessment

Grading policy:

  • Reading and webinar assignments: 20%
  • Homeworks: 50%
  • Final project: 30%

Late work policy: You are allowed once up to 2 late days for homework. Use it wisely and in case of emergency! Email the teaching staff if you anticipate needing an extra day at least 48 hours before the deadline. Otherwise, you will receive a grade of zero.

Final project: 590: lead a project. 490: assist a 590 project

The final project will be a research-style project that will leverage the materials covered and apply them to new data in geosciences. The students will be evaluated on the following items: Formulation of an outstanding research question: an argument for the scientific inquiry based on literature review (PR due )

  1. Design and deploy a scientific workflow: describe in prose or the diagram (e.g., using ASSET) the project.
  2. Design and deploy a data and computing workflow: describe using ASSET
  3. Gather or curate a data set
  4. Develop and deploy an algorithm
  5. Assess the performance of the algorithm
  6. Reproducibility of the results The 5-page report, documentation on the data and codes, 15-min presentation. 590 project leaders perform all listed tasks. 490 project assistants help project leaders, in particular in tasks 4, 5, 6, 7. The relation between project leaders and assistants is that expected during undergraduate research internships. The team will provide a brief progress report PR (½ page PDF to submit on canvas) at the dates listed above, which will help make progress during the quarter.

Syllabus

  • Module 1 (weeks 1 and 2): Intro to DS in Earth Systems (basic practice in open science and computing skills)
  • Module 2: (weeks 3 and 4) Data in the Geosciences (basic data handling)
  • Module 3: (weeks 5 and 6) Unsupervised Learning in Geo
  • Module 5: (weeks 7 and 8) Supervised Learning in Geo
  • Module 6: (weeks 9 and 10) Deep Learning in Geo

Readings and Webinars

The GoogleDoc https://docs.google.com/document/d/15cVLDCpHP74xQqtFq0CrFtrFln1TjdtWq3vg401KJrY/edit#heading=h.zfcrpxuen89 lists references, links, PDFs to the literature assigned and discussed in class. Access limited to participants in the class.