This is the companion repo of the book Essential Math for Data Science. You'll find here the datasets and the audio examples.

Datasets

The folder data contains open source datasets that you'll need in this book, except audio samples for the hands-on project on PCA in chapter 10 (the instructions are given to download the files). In some case, slight modifications have been made to have data easily readable, and sometimes only a subset of the dataset has been selected.

Wine Quality

You'll need this dataset in Chapter 02.

It comes from here. You can also refer to the related paper: Cortez, Paulo, et al. "Modeling wine preferences by data mining from physicochemical properties." Decision Support Systems 47.4 (2009): 547-553.

COVID19

You'll need this dataset in Chapter 06.

This dataset shows the number of cases of COVID-19 in different area of France in March 2020. It comes from here.

CIQUAL

You'll need this dataset in Chapter 09.

This dataset comes from the French Agency for Food, Environmental and Occupational Health & Safety and presents food composition data.

Beer Consumption

You'll need this dataset in Chapter 10.

It shows the relationship between beer consumption and temperature in São Paulo, Brazil for the year 2015footnote:[https://www.kaggle.com/dongeorge/beer-consumption-sao-paulo].

Audio Categorization

You'll need this dataset in Chapter 10.

This dataset is composed of 1500 5-s audio samples. It has been released as a machine learning challenge in 2018 with the goal to categorize audio samples. You can find more details it here. This dataset is not included in the repository, you need to get it here.

Audio Examples

You'll find here the audio resulting of the Hands-on project on Principal Component Analysis.