pik-primap/primap2

pyCPA port: tools.conversion.map_data

Opened this issue · 2 comments

Port the functionality of pyCPA's tools.conversion.map_data

I'm struggling to understand the use cases for this function. The only usage I found is https://github.com/JGuetschow/pyCPA/blob/master/examples/load_CRF.py#L84 with this specification file: https://github.com/JGuetschow/pyCPA/blob/master/examples/metadata/category_conversion/conversion_IPCC2006-1996_pyCPA_example.csv . It contains a list of identical mappings and then one example which sums up a couple of 1996 categories into a single 2006 category (or the other way around, I'm not sure). The use case I extracted from that would be:

User story:
I want to convert data between classifications. For this, I want to state corresponding categories easily. Later, I want to convert another data set between classifications. For this, I want to re-use the previously defined conversion I wrote earlier.

Conditions:
The mapping between classifications can be done by summing or taking the difference between categories.

This is certainly an important use case, and I have been thinking about it already in the context of climate_categories. If the mapping between classifications is nicely just a sum of / difference between categories, it should be rather straight-forward to implement, and the information could be included in climate_categories, for example with a function which produces conversion matrices and a corresponding function in primap2 which then transforms from one classification into the other while handling the meta data. However, in my (limited) experience, most classifications don't map so nicely, and some amount of e.g. splitting needs to be done (thinks for example the splitting of countries or finer sub-categories in a newer classification scheme etc.). So it might be worth thinking about the most common use cases and how they could be solved.

Or maybe I missed the use cases of map_data completely. The function has a lot of potential parameters - do we have other usage examples which would show other use cases?

Map data can be used both to convert between specifications (but you're correct, usually some downscaling is needed) and for the aggregation of data within a categorization. The aggregation for full and clean hierarchies can easily be implemented using the climate categories. In reality data sources often have some deviations from full and clean categorizations and need more complex mappings. The mapping can also be between several meta data columns at once, e.g. multiple secondary categories, or entity dependent target categories. That is the reason for the complicated parameters.
An example of a relatively easy mapping is IPCC1996 to IPCC2006 categories. More complicated is FAO to IPCC2006.