DistrictDataLabs/cultivar

Gather/create datasets for auto analysis validation

bbengfort opened this issue · 1 comments

We need a variety of datasets to show that our auto analysis technique works/doesn't work and how completely it works, and when it fails.

Consider data sets with:

  • a variety of delimiters and escape characters
  • headers and no headers
  • rows of varying lengths
  • columns of many different data types
  • datasets with errors (multiple datatypes per column)
  • datasets with null values of a variety of types

I'm noticing that a lot of the best datasets (e.g. the ones in the worst shape) , I can't even get Trinket to upload. See issue #45