/data-centric-deep-learning

Public repository for the "Data-Centric Deep Learning" course taught by Mike Wu and Andrew Maas. Available at https://corise.com/course/data-centric-deep-learning.

Primary LanguagePythonApache License 2.0Apache-2.0

Welcome to Data-Centric Deep Learning

Data-Centric Deep Learning (DCDL) is a four week class taught by Mike Wu and Andrew Maas. The course is focused on a practical introduction to deep learning engineering and operations, with an emphasis on algorithmic challenges that practitioners face in the real world. To be "data-centric" means leveraging methods and tools that use data to improve, repair, and test deep learning models.

Students will walk through each step of a deep model's lifecycle, from annotation to training to testing to deployment to monitoring back to annotation. In each step, students will be introduced to new tools as well as the underlying methodology.

This class is an extremely hands-on project-driven course. Students will work with real data across images, speech audio, and natural language. Students will leverage state-of-the-art methods to achieve high performance, as well as break these models to analyze their shortcomings in practice.

Class layout

This course will have four weekly projects. Each project will build on concepts from the prior week but have its own standalone components. Week 1 will be completely in a colab notebook, so no code in this repository will be used. Week 2 through 4 will each have their own folders in course/. In each folder e.g. course/week3, you will at least one subfolder. Each subfolder is a project component. The weekly course page will guide you through the different subfolders.

Prerequisites

We expect students to be proficient in Python programming, and familiar with deep learning languages like PyTorch or Tensorflow. Students should have a basic understanding of machine learning and deep learning concepts. Optional knowledge of web applications may be beneficial.

Setup

These projects are best done through Gitpod, which provides 50 hours of use per month for free. We fully expect this to be more than sufficient to complete the projects. Setup is described thoroughly week 0 of the course. In short, you will need to fork this repository and launch a new Gitpod workspace using the forked repo url. All data will be provided within this file, potentially as large files. Finally, you may need to create an account on Label-Studio for weeks 3 and 4, although this can be done through the project.

We do not expect these projects to be run locally. While they can be, we do not plan on supporting this option.