nestauk/old_nesta_daps

[DAPS, metaissue] Create integrated pipeline for data collections to ES

jaklinger opened this issue · 1 comments

  • #265 Add versioning for config management
  • #266 Migrate to base ES config, to remove repetition from config
  • #267 Migrate to template ES mappings, to remove repetition from mappings
    - [ ] #268 Create Sql2ES subclass to pick up pipeline dependencies
  • #269 Add new general endpoint (ES7)
  • #270 (general) Create GtR mapping and validate mapping for ES7
  • #271 (general) Request validation on GtR data, then run in production mode

Prerequisite for the following:

  • #240 Migrate arxlive
  • #241 Migrate eurito
  • #199 fix CB collection pipeline, due to new API version

for each (gtr, arxiv, companies, nih, patstat #313, cordis #315)

  • (general) Create new mapping for ES7 + validate mapping
  • (general) Request validation on data, then run in production mode, add collect
  • Add all general pipelines to weekly schedule

Regarding endpoint migration from ES6 to ES7:
- [ ] Run AWS's migration from ES6.x to ES7.x on eurito-dev, and validate
- [ ] Run AWS's migration from ES6.x to ES7.x on health-scanner, and validate

  • Run AWS's migration from ES6.x to ES7.x on arxlive, and validate

finally

  • Rearrange pipelines into datasets and projects
  • provide training for Luca & Seb

Items which in the end will not be addressed in this issue:

  • NiH data is not added to the general pipeline yet, since the deduplication strategy currently relies on creating two indexes in Elasticsearch, {one for all documents including dupes and applying deduplication logic}, and {one for the deduped documents}. This doesn't fit within the general pipeline paradigm and will require significant rewiring to make this possible (of course, it is doable, but starts to fall out of scope for now). Created issue #317
  • eurito-dev is deprecated in favour of general, and so will not be upgraded to ES7
  • health-scanner is deprecated in favour of health-mosaic, and will be dealt with when I'm given the green light to work on that.