[DAPS, metaissue] Create integrated pipeline for data collections to ES

Question

jaklinger opened this issue 4 years ago · 1 comments

Answer 1 · 2020-09-02T10:26:12.000Z

Items which in the end will not be addressed in this issue:

NiH data is not added to the general pipeline yet, since the deduplication strategy currently relies on creating two indexes in Elasticsearch, {one for all documents including dupes and applying deduplication logic}, and {one for the deduped documents}. This doesn't fit within the general pipeline paradigm and will require significant rewiring to make this possible (of course, it is doable, but starts to fall out of scope for now). Created issue #317
eurito-dev is deprecated in favour of general, and so will not be upgraded to ES7
health-scanner is deprecated in favour of health-mosaic, and will be dealt with when I'm given the green light to work on that.