[EURITO] Factor out dataset-specific processing from NiH
jaklinger opened this issue · 0 comments
jaklinger commented
In line with the newer "dataset <--> project" separation paradigm for data processing, the NiH dataset-specific collection/processing should be factored out of the Health Mosaic pipeline, with the Health Mosaic-specific pipelines to be considered again later, if required.
This will allow for the data to be immediately useable to address EURITO/Pivot#32
Tasks:
- Collect NiH with refactored pipeline, whilst fixing #51
- Deduplicate NiH data without elasticsearch
- Select dedupes, aggregate relevant fields and add enrichment fields (such as countries) - all into one unified table
- Setup data_getter