misoproject/dataset

support multi-dimensional csv

Closed this issue · 2 comments

gka commented

Yet another killer feature for the nice-to-have milestone. Consider the following multi-dimensional csv-structure:

          A      A      A      A      B      B      B      B
country   2008   2009   2010   2011   2008   2009   2010   2011
AFG       10.2    9.6    9.5    7.3    9.8    8.4   10.3    9.4
...

Fully agree on that this is a very special case and also somehow a misuse of the CSV format. But this is what many journalists have to deal with since its a common structure for data coming out of official data sources.

What you most probably want to get out would be something like this:

[{
   "country": "AFG",
   "A": {
         "2008": 10.2,
         "2009": 9.6,
         "2010": 9.5,
         "2011": 7.3
    },
   "B": {
         "2008": 9.8,
         "2009": 8.4,
         "2010": 10.3,
         "2011": 9.4
    }
}, { 
   ...
}]

or maybe twisted

[{
   "country": "AFG",
   "2008": { "A": 10.2, "B": 9.8 },
   "2009": { "A": 9.6, "B": 8.4 },
   "2010": { "A": 9.5, "B": 10.3 },
   "2011": { "A": 7.3, "B": 9.4 }
}, { 
   ...
}]

What do you think, should this strange use case ever be handled by Miso.Dataset?

Two-dimensional data is boring ;-)

Hah! We've talked about this at some length issue 20 is I think a closely related form of the same question. Essentially implemented correctly it provides the ability to handle the denormalised form of most relational database structures. Definitely on on the Todo!

I think this is probably outside scope for now. Essentially to handle properly it requires an entire separate API. It's doable, and could leverage the same infrastructure and ds core but putting aside for now.