/advanced-data-management-and-analytics

An online course in advanced data management and analytics

Primary LanguageJupyter NotebookMIT LicenseMIT

Launch with Binder

Binder

Installation instructions

Install the required packages and resources using:

pip install -r requirements.txt
python -m textblob.download_corpora
python -m spacy download en

Advanced Data Management and Analytics Topics

# Theme Topics Readings Video Lectures
01 Course introduction JOUR.1, WTOP.1 Course Introduction
02 Markdown language WEB.0, WTOP.2 Introduction Warmup Workout
03 Python language fundamentals Variables WTOP.3, WTOP.4 Introduction Warmup Workout
04 Operators WTOP.5 Introduction Warmup Workout
05 Types WTOP.6 Introduction Warmup Workout
06 String WTOP.15 Introduction Warmup Workout
07 Control flow WTOP.8 WTOP.10 Introduction Warmup Workout
08 Functions WTOP.9 Introduction Warmup Workout
09 Data manipulation with Python Built-in data structures WTOP.7 Introduction Warmup Workout
10 List comprehensions WTOP.12 Introduction Warmup Workout
11 Regular expressions WTOP.15 Introduction Warmup Workout
12 Generators and generator expressions WTOP.13 TBD
13 Modules and packages WTOP.14 TBD
14 Data organization with Pandas DataFrame and Series PDSH.3.1 Introduction Warmup Workout
15 Data indexing and selection PDSH.3.2 Introduction Warmup Workout
16 Operating on data in Pandas PDSH.3.3 Introduction Warmup Workout
17 Handling missing data PDSH.3.4 Introduction Warmup Workout
18 Hierarchical indexing PDSH.3.5 Introduction Warmup Workout
19 Data transformation Combining datasets: concat & append PDSH.3.6 Introduction Warmup Workout
20 Joining datasets: merge & join PDSH.3.7 Introduction Warmup Workout
21 Aggregation and grouping PDSH.3.8 Introduction Warmup Workout
22 Vectorized string operations PDSH.3.10 Introduction Warmup Workout
23 Time and date types and operations PDSH.3.11 Introduction Warmup Workout
24 High performance Pandas PDSH.3.12 TBD
25 Pivot tables PDSH.3.9 TBD
26 Data integration APIs: JSON, REST & GraphQL WEB.1, WEB.12, WEB.13 Introduction Warmup Workout
27 APIs: XML, XPATH & XQUERY WEB.14, WEB.15 Introduction Warmup Workout
28 HTML scraping WEB.3 Introduction Warmup Workout 1 Workout 2
29 Data visualization Principles of data visualization JOUR.2, WEB.4 Introduction How to Think about Data Visualization 200 Countries, 200 Years, 4 Minutes
30 Introduction to Altair VISU.1 Introduction Warmup Workout
31 Types, marks, and encoding channels VISU.2 Introduction Warmup Workout
32 Altair data transformation VISU.3 Introduction Warmup Workout
33 Scales, axes, and legends VISU.4 TBD
34 Multi-view composition VISU.5 TBD
35 Interaction VISU.6 TBD
36 Text analytics Syntax: tokenization and POS tagging WEB.9 Introduction 1 Introduction 2 Warmup Workout
37 Semantics: sentiment and named entities WEB.11 Introduction Warmup Workout
38 Network analytics Basic network concepts NETS.1 Introduction 1 Introduction 2 Warmup Workout
39 Network structure NETS.2 Introduction 1 Introduction 2 Warmup Workout
40 Network content and structure WEB.10 Introduction 1 Introduction 2 Warmup Workout
41 Machine learning Introduction to machine learning PDSH.5.1, PDSH.5.2 Introduction Warmup Workout
42 Linear regression PDSH.5.6 Introduction Warmup Workout
43 Decision trees PDSH.5.8 Introduction Warmup Workout
44 Clustering PDSH.5.11 Introduction Warmup Workout
45 Course wrap-up JOUR.3 Course Wrap-Up

Source Code

Software

Python 3.7 or higher is required. The easy way to install it is with Anaconda distribution. Several other packages including Pandas, Altair and Scikit-learn are needed among others. A requirements.txt is provided. All required packages can be installed with pip install -r requirements.txt

Readings

This courses uses open access materials. I appreciate and thank authors and publishers who made these resources free to use.

The numbers on the lists below match those in the table.