nestauk/old_nesta_daps

[EURITO] Factor out dataset-specific processing from NiH

jaklinger opened this issue · 0 comments

In line with the newer "dataset <--> project" separation paradigm for data processing, the NiH dataset-specific collection/processing should be factored out of the Health Mosaic pipeline, with the Health Mosaic-specific pipelines to be considered again later, if required.

This will allow for the data to be immediately useable to address EURITO/Pivot#32

Tasks:

  • Collect NiH with refactored pipeline, whilst fixing #51
  • Deduplicate NiH data without elasticsearch
  • Select dedupes, aggregate relevant fields and add enrichment fields (such as countries) - all into one unified table
  • Setup data_getter