This repository contains scripts used to parse the ClinVarFullRelease.xml file from ClinVar and organize the data according to a customized data model.
This is not an official repository for ClinVar, please refer to the website of ClinVar.
The DESCRIPTION.json file provides a formal description of the project including its version, its maintainer and the original sources of information.
The data model has been formalized using the ReDaMoR R package (available on CRAN). It is available in the model folder.
The Collections subfolder contains json files which formally describe tables gathering information about key concepts, such as genes or diseases, that can be used create cross-reference with other resources. Collections mechanisms are developed in the frame of the TKCat project.
Parsing scripts are located in the scripts folder.
The created flat files are used to feed the database supporting DODO (Dictionary of Disease Ontologies).