nestauk/old_nesta_daps

[DAPS] Migrate to base ES config, to remove repetition from config

jaklinger opened this issue · 0 comments

  • compactify es config format
  • add get_es_config to orm_utils, to replace get_config for ES
  • change setup_es and increment_version logic
  • find and replace setup_es syntax accordingly
  • automatically retrieve alias, which is tied to the endpoint name
  • Add endpoint field to all ElasticsearchTasks
  • Add endpoint field to all Sql2EsTaskss

The following close #267 but are tracked here

  • re-align mapping naming syntax with new config:
.
├── datasets              # in the future, these will be for overriding the open ontology (daps2)
│   ├── arxiv_mapping.json
│   ├── companies_mapping.json
│   ├── cordis_mapping.json
│   ├── gtr_mapping.json
│   ├── meetup_mapping.json
│   ├── nih_mapping.json
│   └── patstat_mapping.json
├── defaults              # e.g. for new analyzers
│   ├── index.json
│   └── settings.json
└── endpoints             # project specific stuff
    ├── arxlive
    │   └── arxiv_mapping.json
    ├── eurito
    │   ├── arxiv_mapping.json
    │   ├── companies_mapping.json
    │   └── patstat_mapping.json
    └── health-scanner
        ├── aliases.json     # formerly under "aliases/health-scanner.json"
        ├── config.yaml     # currently just to flag that the aliases should be hard
        └── nulls.json    # formerly under "field_null_mappings/health_scanner.json"
  • verify that all "new" mappings are the same as the "old" ones (show the diff somewhere)
    add tests:
  • all endpoints cannot have identical fields (as they should go under "datasets")
  • all fields must match the ontology, as before
  • all aliases must match the mappings, as before
  • all nulls must match the mappings, as before
  • remove old mappings
  • add documentation for the logic here (also note that in the future, the configuration will drop the version, as this field will be generated automatically from semver+hash)
  • rewire orm_utils and relevant batchables for aliases and null mappings
  • re-run all dev pipelines