Launch with Binder
Installation instructions
Install the required packages and resources using:
pip install -r requirements.txt
python -m textblob.download_corpora
python -m spacy download en
Advanced Data Management and Analytics Topics
# | Theme | Topics | Readings | Video Lectures |
---|---|---|---|---|
01 | Course introduction | JOUR.1, WTOP.1 | Course Introduction | |
02 | Markdown language | WEB.0, WTOP.2 | Introduction Warmup Workout | |
03 | Python language fundamentals | Variables | WTOP.3, WTOP.4 | Introduction Warmup Workout |
04 | Operators | WTOP.5 | Introduction Warmup Workout | |
05 | Types | WTOP.6 | Introduction Warmup Workout | |
06 | String | WTOP.15 | Introduction Warmup Workout | |
07 | Control flow | WTOP.8 WTOP.10 | Introduction Warmup Workout | |
08 | Functions | WTOP.9 | Introduction Warmup Workout | |
09 | Data manipulation with Python | Built-in data structures | WTOP.7 | Introduction Warmup Workout |
10 | List comprehensions | WTOP.12 | Introduction Warmup Workout | |
11 | Regular expressions | WTOP.15 | Introduction Warmup Workout | |
12 | Generators and generator expressions | WTOP.13 | TBD | |
13 | Modules and packages | WTOP.14 | TBD | |
14 | Data organization with Pandas | DataFrame and Series | PDSH.3.1 | Introduction Warmup Workout |
15 | Data indexing and selection | PDSH.3.2 | Introduction Warmup Workout | |
16 | Operating on data in Pandas | PDSH.3.3 | Introduction Warmup Workout | |
17 | Handling missing data | PDSH.3.4 | Introduction Warmup Workout | |
18 | Hierarchical indexing | PDSH.3.5 | Introduction Warmup Workout | |
19 | Data transformation | Combining datasets: concat & append | PDSH.3.6 | Introduction Warmup Workout |
20 | Joining datasets: merge & join | PDSH.3.7 | Introduction Warmup Workout | |
21 | Aggregation and grouping | PDSH.3.8 | Introduction Warmup Workout | |
22 | Vectorized string operations | PDSH.3.10 | Introduction Warmup Workout | |
23 | Time and date types and operations | PDSH.3.11 | Introduction Warmup Workout | |
24 | High performance Pandas | PDSH.3.12 | TBD | |
25 | Pivot tables | PDSH.3.9 | TBD | |
26 | Data integration | APIs: JSON, REST & GraphQL | WEB.1, WEB.12, WEB.13 | Introduction Warmup Workout |
27 | APIs: XML, XPATH & XQUERY | WEB.14, WEB.15 | Introduction Warmup Workout | |
28 | HTML scraping | WEB.3 | Introduction Warmup Workout 1 Workout 2 | |
29 | Data visualization | Principles of data visualization | JOUR.2, WEB.4 | Introduction How to Think about Data Visualization 200 Countries, 200 Years, 4 Minutes |
30 | Introduction to Altair | VISU.1 | Introduction Warmup Workout | |
31 | Types, marks, and encoding channels | VISU.2 | Introduction Warmup Workout | |
32 | Altair data transformation | VISU.3 | Introduction Warmup Workout | |
33 | Scales, axes, and legends | VISU.4 | TBD | |
34 | Multi-view composition | VISU.5 | TBD | |
35 | Interaction | VISU.6 | TBD | |
36 | Text analytics | Syntax: tokenization and POS tagging | WEB.9 | Introduction 1 Introduction 2 Warmup Workout |
37 | Semantics: sentiment and named entities | WEB.11 | Introduction Warmup Workout | |
38 | Network analytics | Basic network concepts | NETS.1 | Introduction 1 Introduction 2 Warmup Workout |
39 | Network structure | NETS.2 | Introduction 1 Introduction 2 Warmup Workout | |
40 | Network content and structure | WEB.10 | Introduction 1 Introduction 2 Warmup Workout | |
41 | Machine learning | Introduction to machine learning | PDSH.5.1, PDSH.5.2 | Introduction Warmup Workout |
42 | Linear regression | PDSH.5.6 | Introduction Warmup Workout | |
43 | Decision trees | PDSH.5.8 | Introduction Warmup Workout | |
44 | Clustering | PDSH.5.11 | Introduction Warmup Workout | |
45 | Course wrap-up | JOUR.3 | Course Wrap-Up |
Source Code
Software
Python 3.7 or higher is required. The easy way to install it is with Anaconda distribution. Several other packages including Pandas, Altair and Scikit-learn are needed among others. A requirements.txt
is
provided. All required packages can be installed with pip install -r requirements.txt
Readings
This courses uses open access materials. I appreciate and thank authors and publishers who made these resources free to use.
The numbers on the lists below match those in the table.
-
WTOP: A Whirlwind Tour of Python: a fast-paced introduction to essential features of the Python language.
- 1 Introduction
- 2 How to Run Python Code
- 3 Basic Python Syntax
- 4 Python Semantics: Variables
- 5 Python Semantics: Operators
- 6 Built-In Scalar Types
- 7 Built-In Data Structures
- 8 Control Flow Statements
- 9 Defining Functions
- 10 Errors and Exceptions
- 11 Iterators
- 12 List Comprehensions
- 13 Generators and Generator Expressions
- 14 Modules and Packages
- 15 Strings and Regular Expressions
-
PDSH: Python Data Science Handbook: a complete coverage for modern Python tools for researchers and data scientists.
- 3 Data Manipulation with Pandas
- 1 Introducing Pandas Objects
- 2 Data Indexing and Selection
- 3 Operating on Data in Pandas
- 4 Handling Missing Data
- 5 Hierarchical Indexing
- 6 Combining Datasets: Concat and Append
- 7 Combining Datasets: Merge and Join
- 8 Aggregation and Grouping
- 9 Pivot Tables
- 10 Vectorized String Operations
- 11 Working with Time Series
- 12 High-Performance Pandas: eval() and query()
- 5 Machine Learning
- 1 What Is Machine Learning?
- 2 Introducing Scikit-Learn
- 3 Hyperparameters and Model Validation
- 4 Feature Engineering
- 5 In Depth: Naive Bayes Classification
- 6 In Depth: Linear Regression
- 7 In-Depth: Support Vector Machines
- 8 In-Depth: Decision Trees and Random Forests
- 9 In Depth: Principal Component Analysis
- 10 In-Depth: Manifold Learning
- 11 In Depth: k-Means Clustering
- 3 Data Manipulation with Pandas
-
VISU: Data Visualization Curriculum: a data visualization curriculum of interactive notebooks from UW Interactive Data Lab.
-
NETS: Network Science: a freely available textbook for network science by Albert-László Barabási and others
-
Articles from the web
- WEB.0: Mastering Markdown
- WEB.1: Python’s Requests Library (Guide)
- WEB.2: Quandl API Documentation
- WEB.3: Python Web Scraping Using BeautifulSoup
- WEB.4: Intro, Data and Tasks, Marks and Channels
- WEB.5: Visualizing statistical relationships
- WEB.6: Plotting with categorical data
- WEB.7: Visualizing the distribution of a dataset
- WEB.8: Text Analytics for Beginners using NLTK
- WEB.9: Natural Language Basics with TextBlob
- WEB.10:Exploring and Analyzing Network Data with Python
- WEB.11:spaCy 101: Everything you need to know
- WEB.12: Python API Tutorial: Getting Started with APIs
- WEB.13: GraphQL is the better REST
- WEB.14: Python XML with ElementTree: Beginner's Guide
- WEB.15: XQuery Tutorial
-
Journal articles
- JOUR.1: Watson, Hugh J. "Should you pursue a career in BI/analytics." Business Intelligence Journal (2015).
- JOUR.2: Watson, H. "Data Visualization, Data Interpreters, and Storytelling." Business Intelligence Journal (2017)
- JOUR.3: Davenport, T. H., & Patil, D. J. "Data Scientist: The Sexiest Job of the 21st Century." Harvard business review (2012)