tsdataclinic/smooshr

Define a schema for the different components of the smooshr data model

Opened this issue · 0 comments

As we move to a different storage system and way of representing operations on a dataset, we will need a more robust schema. Currently, the very simple schema we have is

  • Project: Contains multiple datasets
  • Dataset: represents the full dataset as a set of summary data and multiple Columns and MetaColumns
  • Column: Represents a column in the original dataset, has a name and a list of unique entries
  • MetaColumn: A simple way of treating two columns as 1, this ultimetly gets merged in to a single column when we run the code output
  • Entry: A unique entry in a column which has a value and the number of times it occurs in that column
  • Mapping: A collections of entries for a specific column that will be mapped to another value,

We probably want to rethink this schema to make it a lot more rhobust to other tasks we want to run in smooshr.