OHDSI/Arachne

Metadata format proposal

konstjar opened this issue · 5 comments

The idea is to have in analysis and in Strategus json file additional metadata that will be used for prepopulation of Submission form fields in ARACHNE datanode:

Dedicated file in .zip archive

File name: analysisMetadata.json

Content:

{
    "analysisName": "Simvastatin",
    "analysisType": "COHORT",
    "runtimeEnvironmentName": "Default Runtime",
    "dockerRuntimeEnvironmentImage": "ohdsi/r-hades:latest" 
    "entryPoint": "TestCDMConnector/main.R",
    "studyName": "My study"
}

Strategus format extension

Content:

{
  "metadata": {
      "analysisName": "Simvastatin",
      "analysisType": "COHORT",
      "runtimeEnvironmentName": "Default Runtime",
      "dockerRuntimeEnvironmentImage": "ohdsi/r-hades:latest" 
      "studyName": "My study"
  }
}

For darwin we absolutely need the option to provide this metadata along with the study. We do not want the data partner to change the execution environment in the UI. We essentially want the data partner to upload the study zip file, select the cdm database, and click run. They should not need to provide the entrypoint file (becuase how would they know this?) or execution environment. We want to provide as many details about study execution in the metadata as possible so the data partner just runs the study and that's it.

This should work just fine for Darwin I think. I think all Darwin studies (at least right now) will be of custom type.

{
    "analysisName": "Simvastatin",
    "analysisType": "CUSTOM",
    "runtimeEnvironmentName": "Default Runtime",
    "dockerRuntimeEnvironmentImage": "ohdsi/r-hades:latest" 
    "entryPoint": "TestCDMConnector/main.R",
    "studyName": "My study"
}

One question: is there a dependency between the two parameters runtimeEnvironmentName and dockerRuntimeEnvironmentImage?

How do these parameters interact with each other?

Also for darwin we do have both study name and study ID. Not sure if others need both of these as well.

Anyway basically it should work well I think.

In addition to standardized fields in the metadata I would like to request the flexibility to add custom fields that will be injected into the runtime environment (Docker image) as environment variables.

Here is an example.

execution-config.json

{
    "analysisName": "Simvastatin",
    "analysisType": "COHORT",
    "runtimeEnvironmentName": "Default Runtime",
    "dockerRuntimeEnvironmentImage": "ohdsi/r-hades:latest" 
    "entryPoint": "TestCDMConnector/main.R",
    "studyName": "My study",
    "EnvironmentVariables": {
          "variable_name1": "custom_variable",
          "variable_name2":"5"
     }
}
  • Additional environment variables: #73
  • Configuration option for metadata file names: #72

Implemented in #75

Supported file names: execution-config.json and metadata.json