
clinShot provide clinical variation and links between clinvar IDs and dbsnp IDs in json format.

Primary LanguagePython


clinShot is a tool that provides latest Clinvar database in json format for Homo sapiens (human) genome assembly Grch38 version. clinShot downloads and analyzes a VCF file in your choice (in case you want to precise the file) or clinvar VCF file from ncbi database. Then, it extracts clinical variations, links them with clinvar IDs (ID) and dbsnp IDs (RS).


clinShot uses several python packages. Docker allows to manage the environment and dependencies. Please see the instructions to install Docker.


A brief guide of how to use clinShot :

The setup file setup.py cythonizes complementTools.pyx packages by compiling them in C/C++, and use the compiled tools to maximize the run. The main script has three parameters as following :

USAGE: main.py [OPTIONS]
  -url URL                  URL to NCBI Clinvar page. This should not return
                            direct vcf file. [Required] 
  -vcf VCF                  Name of the vcf file available in the NCBI Clinvar page. [Optional]
  -output OUTPUT            Path to desired output folder. Defaults to the
                            same place as the specified output folder. [Optional]

Run from the container Docker :

To facilitate environmental management, you can easily run and get the results from Docker container in the output folder using those command lines:

# Build Docker image
docker build --tag clinshot .
# Run the pipeline using one parameter (-url) and get the output results in output folder
docker container run -v $PWD/output:/output clinshot:latest -url https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/ -vcf homo_sapiens

Run locally :

You need anaconda installed to run the pipeline. If you don't have anaconda please see instructions to install it, then run locally:

conda env create -f requirements.yml
conda activate clinshot
python main.py -url https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/ -vcf homo_sapiens -output_directory clinical_output


The tool creates two json files: nodes.json (clinical variations) and links.json (links between clinvar IDs (ID) and dbsnp IDs (RS).

nodes.json :

"CHROM": "1",
"POS": 1014042,
"ID": "475283",
"REF": "G",
"ALT": "A",
"AF_ESP": 0.00546,
"AF_EXAC": 0.00165,
"AF_TGP": 0.00619,
"ALLELEID": 446939


"_from": "475283",
"_to": "143888043"
and links.json:
"_from": "475283",
"_to": "143888043"


Yasmine Draceni - 2021