OHDSI/CommonDataModel

CDM Results Schema

clairblacketer opened this issue · 10 comments

Creation of CDM Results schema


Proposal

Relevant tables:

The four tables listed above need to have the ability to be edited by the user. As is stands now, most CDMs are in read-only schemas which basically renders these tables useless. A formal 'results' schema that allows users write-access should be created to house them. To take it a step further, the COHORT_ATTRIBUTE and ATTRIBUTE_DEFINITION tables should be removed altogether as they have no existing use cases, nor are they currently being used by anyone in the community (to our knowledge).

In summation, we propose to:

  1. Move COHORT and COHORT_DEFINITION to a formal CDM 'results' schema
  2. Remove COHORT_ATTRIBUTE and ATTRIBUTE_DEFINITION

Consequences

A separate results DDL would need to supplied along side the current CDM DDL with each new release.

The question of which tables are read only and which writable is hard. With METADATA we have part of that problem as well. If new annotation needs to be made by user and it is in CDM schema.

If the CDM constructed cohort and cohort_definition tables are moved to results schema, then there maybe a potential risk of conflict with the webapi/Atlas constructed/managed cohort/cohort_definition tables.

In current design, I think, if an external (non Atlas) application or manual queries writes to existing results.cohort results.cohort_deinition tables, it cause Atlas to throw errors.

If we go down the route of using a shared writable cohort table, shared across multiple applications -- maybe we need additional fields for 'application lineage'.. i.e. Atlas would only look for records in cohort table where the application id matches the application id of Atlas.?

In general, I do not believe it is a good idea to make any table in OMOP CDM schema (which is really a type of a "data mart" ) writable. In typically enterprise environments, DWs and DMs are not allowed to be updatable by users and only can be updated by incoming ETL processes. E.g. the data needs to flow into CDM via ETL.

If I understand the original idea of having the COHORT and COHORT_DEFINITION tables in OMOP CDM was that if the source data DOES HAVE cohort definitions as a part of that raw data set, these definitions can be then transferred into OMOP CDM (I have seen these in a few data sets). With that concept in mind, it was not meant to be updated by the end users who would instead use ATLAS to define user specific cohorts the definitions of which will be in OHDSI shema but the generated results that would go into RESULTS schema. I think it is not a bad idea to have those tables IF we properly describe the use cases.

As far as METADATA and ANNOTATION, that is an interesting question. If we follow the pattern and want to be consistent, METADATA should be populated by ETL and ANNOTATIONS should really move into RESULTS since it is meant to be updated by the end user. Then ATLAS can be extended to add the annotation functionality that would write into ANNOTATION table sitting in results and would leave the CDM schema untouched. This will also work if we decide to move ACHILLES data into METADATA table.

Another pattern that is emerging is that we always have to create two schemas - one for CDM and one RESULTS, one of data and for one of data analysis. And I think we should always be discussing both in our CDM WG since RESULTS are always linked to and cannot be detached from the CDM schema.

Annotation table, @gklebanov. Not aware we have such a thing.

Do we have this in a proposal here?

Is there a corresponding public documentation of the results schema along the lines of what is done for the CDM: https://github.com/OHDSI/CommonDataModel/wiki?

We have to add the documentation about schemas. Right now, the documentation completely avoids prescribing implementation recommendations, such as schemas and privileges to them. Will do.

@mgurley the results schema will be new for CDMv6.0 so the documentation will be updated accordingly with the release.

@cgreich the Metadata and Annotation tables do not have a fully fleshed out proposal yet because their structure is still being discussed. Plans are for Ajit to present at the November workgroup meeting.

@gowthamrao I don't believe there will be conflict with the WebAPI if we move these tables to a results schema. We have been using a results schema for a while and it has worked out very well. ATLAS writes all of our cohorts to ohdsi_results.cohort and so far we haven't had any issues (@fdefalco correct me if I'm wrong)

added in v6.0