/dirty-data

Dirty (Dancing) Data is a repository of fun datasets that need some work.

dirty-data

Dirty (Dancing) Data is a repository of interesting (sometimes fun) datasets that need some work. alt text

Our World In Data: Coronavirus

https://ourworldindata.org/coronavirus

Which countries are doing better and which are doing worse? We built 207 country profiles which allow you to explore the statistics on the coronavirus pandemic for every country in the world.

Each profile includes interactive visualizations, explanations of the presented metrics, and the details on the sources of the data.

Every country profile is updated daily.

Every profile includes four sections Deaths: How many deaths from Coronavirus have been reported? Is the number of deaths still increasing? How does the death rate compare to other countries? Testing: How much testing for coronavirus do countries conduct? When did they start and how does it compare with other countries? Cases: How many cases were confirmed? How many tests did a country do to find one COVID-19 case? And is your country bending the curve? Government responses: What measures did countries take in response to the pandemic?

Metro Nashville Open Data Set

https://data.nashville.gov/

Nashville provides data on business, culture, education, health, public safety, and more. This data is real-world, and requires cleanup, feature engineering, and additional work to frame questions and outcome variables.

Evictions

https://evictionlab.org/

"The Eviction Lab at Princeton University has built the first nationwide database of evictions. Find out how many evictions happen in your community. Create custom maps, charts, and reports." This data is real-world, and requires cleanup, feature engineering, and additional work to frame questions and outcome variables.

Machine Learning Irvine

https://archive.ics.uci.edu/ml/index.php

UCI Machine Learning Repository currently maintains 475 data sets as a service to the machine learning community. These datasets are ready for immediate model building.

MLBench Data

https://cran.r-project.org/web/packages/mlbench/mlbench.pdf

MLBench is a reformatting of some of the datasets commonly used in UCI Machine Learning Repository and supporting code for benchmarking. The dataframes are stored in the mlbench-data subdirectory as read-to-use csv files.

NYC Flight Data

A perennial favorite! This data is provided in the nycflights13 package, but is included here in the nycflights13 folder for practice in reading data from CSVs.