JhuangLab/BioInstaller

Share bioinformatics database

Miachol opened this issue · 25 comments

Configuration file: db_main.toml
Description: miRDB is an online database for miRNA target prediction and functional annotations.

[db_mirdb]
source_url = "http://mirdb.org/download/miRDB_v{{version}}_prediction_result.txt.gz"
version_avaliable = ["5.0", "4.0", "3.0", "2.0", "1.0"]

Configuration file: db_main.toml
Description: As a database, miRTarBase has accumulated more than three hundred and sixty thousand miRNA-target interactions (MTIs), which are collected by manually surveying pertinent literature after NLP of the text systematically to filter research articles related to functional studies of miRNAs.

[db_mirtarbase]
source_url = "http://mirtarbase.mbc.nctu.edu.tw/cache/download/{{version}}/miRTarBase_MTI.xlsx"
version_avaliable = ["7.0"]

Configuration file: db_main.toml
Description: miRNEST is an integrative collection of animal, plant and virus microRNA data.

[db_mirnest]
source_url = "http://rhesus.amu.edu.pl/mirnest/copy/downloads/{{version}}.gz"
version_avaliable = ["mirnest_EST_predictions", "mirnest_targets", "mirnest_deep_predictions", 
                                  "mirnest_degradomes", "mirnest_mirtrons", "mirnest_mirna_gene_structure"]

Configuration file: db_main.toml
Description: RBPDB, the database of RNA-binding protein specificities

[db_rbpdb]
source_url = "http://rbpdb.ccbr.utoronto.ca/downloads/{{version}}"
version_avaliable = ["RBPDB_v1.3.1_2012-11-21.sql", "RBPDB_v1.3.1_2012-11-21_TDT.zip", 
                                  "RBPDB_v1.3.1_2012-11-21_CSV.zip"]

Configuration file: db_main.toml
Description: APPRIS, annotating principal splice isoforms

[db_appris]
source_url = "http://apprisws.bioinfo.cnio.es/pub/current_release/datafiles/homo_sapiens/{{version}}/appris_data.principal.txt"
version_avaliable = ["GRCh38", "rs108v26", "up201703v26", "a1v26", "GRCh37", "rs105v24", "g12v24"]

Configuration file: db_main.toml
Description: LNCipedia is a public database for long non-coding RNA (lncRNA) sequence and annotation. The current release contains 127,802 transcripts and 56,946 genes.

[db_lncipedia]
source_url = "https://lncipedia.org/downloads/lncipedia_{{version}}"
version_avaliable = ["5_1_hg19.bed", "5_1_hg38.bed", "5_1_hc_hg19.bed", "5_1_hc_hg38.bed", 
                                  "5_1.fasta", "5_1_hc.fasta", "5_1_hg19.gff", "5_1_hg38.gff", "5_1_hc_hg19.gff", 
                                  "5_1_hc_hg38.gff", "5_1_hg19.gtf", "5_1_hg38.gtf", "5_1_hc_hg19.gtf", 
                                  "5_1_hc_hg38.gtf"]

Configuration file: db_main.toml
Description: MSigDB (Molecular Signatures Database), The Molecular Signatures Database (MSigDB) is a collection of annotated gene sets for use with GSEA software.

[db_msigdb]
source_url = "http://bioinfo.rjh.com.cn/download/bioinstaller/msigdb/{{version}}"
version_avaliable = ["c1.all.v6.2.entrez.gmt", "c1.all.v6.2.symbols.gmt", "c2.all.v6.2.entrez.gmt", "c2.all.v6.2.symbols.gmt", "c2.cgp.v6.2.entrez.gmt", "c2.cgp.v6.2.symbols.gmt", "c2.cp.biocarta.v6.2.entrez.gmt", "c2.cp.biocarta.v6.2.symbols.gmt", "c2.cp.kegg.v6.2.entrez.gmt", "c2.cp.kegg.v6.2.symbols.gmt", "c2.cp.reactome.v6.2.entrez.gmt", "c2.cp.reactome.v6.2.symbols.gmt", "c2.cp.v6.2.entrez.gmt", "c2.cp.v6.2.symbols.gmt", "c3.all.v6.2.entrez.gmt", "c3.all.v6.2.symbols.gmt", "c3.mir.v6.2.entrez.gmt", "c3.mir.v6.2.symbols.gmt", "c3.tft.v6.2.entrez.gmt", "c3.tft.v6.2.symbols.gmt", "c4.all.v6.2.entrez.gmt", "c4.all.v6.2.symbols.gmt", "c4.cgn.v6.2.entrez.gmt", "c4.cgn.v6.2.symbols.gmt", "c4.cm.v6.2.entrez.gmt", "c4.cm.v6.2.symbols.gmt", "c5.all.v6.2.entrez.gmt", "c5.all.v6.2.symbols.gmt", "c5.bp.v6.2.entrez.gmt", "c5.bp.v6.2.symbols.gmt", "c5.cc.v6.2.entrez.gmt", "c5.cc.v6.2.symbols.gmt", "c5.mf.v6.2.entrez.gmt", "c5.mf.v6.2.symbols.gmt", "c6.all.v6.2.entrez.gmt", "c6.all.v6.2.symbols.gmt", "c7.all.v6.2.entrez.gmt", "c7.all.v6.2.symbols.gmt", "msigdb.v6.2.entrez.gmt", "msigdb.v6.2.symbols.gmt", "msigdb_v3.0.zip", "msigdb_v3.1.zip", "msigdb_v4.0.zip", "msigdb_v5.0.zip", "msigdb_v5.1.zip", "msigdb_v5.1_chip.zip", "msigdb_v5.2.zip", "msigdb_v5.2_chip.zip", "msigdb_v6.0.zip", "msigdb_v6.0_chip.zip", "msigdb_v6.1.zip", "msigdb_v6.1_chip.zip", "msigdb_v6.2.xml", "msigdb_v6.2.zip"]

Configuration file: db_main.toml
Description: miRCancer provides comprehensive collection of microRNA (miRNA) expression profiles in various human cancers which are automatically extracted from published literatures in PubMed. It utilizes text mining techniques for information collection. Manual revision is applied after auto-extraction to provide 100% precision.

[db_mircancer]
source_url = "http://mircancer.ecu.edu/downloads/{{version}}.txt"
version_avaliable = ["miRCancerOctober2017", "miRCancerMarch2017", "miRCancerDecember2016", "miRCancerSeptember2016", "miRCancerJune2016", "miRCancerMarch2016", "miRCancerDecember2015", "miRCancerSeptember2015", "miRCancerJune2015", "miRCancerMarch2015", "miRCancerDecember2014", "miRCancerSeptember2014", "miRCancerJune2014", "miRCancerMarch2014", "miRCancerDecember2013", "miRCancerSeptember2013", "miRCancerJune2013", "miRCancerMarch2013", "miRCancerNovember2012"]

Configuration file: db_main.toml
Description: DCDB (Drug Combination Database), Accumulating scientific and clinical evidences have suggested the use of drug combinations as a safe and effective approach, to treat complicated and refractory diseases. The Drug Combination Database (DCDB) is devoted to the research and development of multi-component drugs. The current version of DCDB collected 1363 drug combinations (330 approved and 1033 investigational, including 237 unsuccessful usages), involving 904 individual drugs, 805 targets.

[db_dcdb]
source_url = "http://www.cls.zju.edu.cn/dcdb/downloadfile/{{version}}.zip"
version_avaliable = ["DCDB2_plaintxt", "DCDB2.sql", "targets", "Drug_combinations", 
                                  "components_identifier"]

Configuration file: db_main.toml
Description: OncomiRDB, aiming at annotating the experimentally verified oncogenic and tumor-suppressive miRNAs from literature.

[db_oncomirdb]
source_url = "http://lifeome.net/database/oncomirdb/oncomirdb.v-{{version}}_download.txt"
version_avaliable = "1.1-20131217"

Configuration file: db_main.toml
Description: IslandViewer, This web site was developed so that researchers could easily view and download genomic islands for all published sequenced bacterial and archaeal genomes that have been predicted using the the currently most accurate GI prediction methods.

[db_islandviewer]
source_url = "http://www.pathogenomics.sfu.ca/islandviewer/download/datasets/all_gis_{{version}}.txt.tar.gz"
version_avaliable = ["islandviewer_iv4", "islandpick_iv4", "islandpath_dimob_iv4", 
                                  "sigi_hmm_iv4", "islander_iv4"]

Configuration file: db_main.toml
Description: hPDI (Human Protein-DNA Interactome), The hPDI database holds experimental protein-DNA interaction data for humans identified by protein microarray assays. The current release of hPDI contains 17,718 protein-DNA interactions for 1013 human DNA-binding proteins. These DNA-binding proteins include 493 human transcription factors (TFs) and 520 unconventional DNA binding proteins (uDBPs).

[db_hpdi]
source_url = "http://bioinfo.wilmer.jhu.edu/PDI/{{version}}"
version_avaliable = ["protein_chip_full_seq.csv", "protein_annotation.txt", 
                                  "pro2motif.txt", "DNA_motifs.txt", "motif2protein.txt", 
                                  "all_pwm.zip", "all_gpr_files.zip", "supplemental.pdf"]

Configuration file: db_main.toml
Description: dbSNO, Protein S-nitrosylation (SNO) is a reversible post-translational modification (PTM) and involves the covalent attachment of nitric oxide (NO) to the thiol group of cysteine (Cys) residues. Given the increasing number of proteins reported to be regulated by this modification, S-nitrosylation is considered to act, in a manner analogous to phosphorylation, as a pleiotropic regulator that elicits dual effects to regulate diverse pathophysiological processes by altering protein function, stability, and conformation change in various cancers and human disorders.

[db_dbsno]
source_url = "http://140.138.144.145/~dbSNO/download/dbSNO{{version}}_all_data.txt.gz"
version_available = "v2"

Configuration file: db_main.toml
Description: PhosphoNetworks, a combined bioinformatics and protein microarray-based strategy to construct a high-resolution map of the human phosphorylation networks.

[db_phosphonetworks]
soource_url = "http://www.phosphonetworks.org/download/{{version}}"
version_available = ["rawKSI.csv", "refKSI.csv", "comKSI.csv", "motifSite.csv", 
                                  "motifMatrix.csv",  "motifLogo.tar", "highResolutionNetwork.csv", 
                                  "supplemental.pdf"]

Configuration file: db_main.toml
Description: ConsensusPathDB, ConsensusPathDB-human integrates interaction networks in Homo sapiens including binary and complex protein-protein, genetic, metabolic, signaling, gene regulatory and drug-target interactions, as well as biochemical pathways.

[db_consensuspathdb]
source_url = "http://cpdb.molgen.mpg.de/download/ConsensusPathDB_{{version}}.gz"
version_available = ["human_PPI", "human_PPI.psi25"]

Configuration file: db_main.toml
Description: INstruct, a database of high-quality protein interactome networks annotated to 3D structural resolution. We currently catalogue 6585 human, 644 A. thaliana, 120 C. elegans, 166 D. melanogaster, 119 M. musculus, 1273 S. cerevisiae, and 37 S. pombe structurally resolved interactions. The interactions shown on this site have been curated from some of the most popular interaction databases and filtered to reflect only binary interactions that meet our strict quality criteria. The schematic below shows how we are then able to reconstruct 3D interaction interfaces for our high-quality set by using available co-crystal structures.

[db_instruct]
source_url = "http://instruct.yulab.org/download/{{version}}.sin"
version_available = ["sapiens", "thaliana", "elegans", "melanogaster", "musculus", "cerevisiae", "pombe"]

Configuration file: db_main.toml
Description: RedoxDB, a manually curated database of experimentally verified protein oxidative modification. RedoxDB mainly consists of two types of data: dataset (A) includes redox proteins for which the modified Cys have been experimentally verified, and dataset (B) includes redox proteins that the modified Cys have not been determined yet. When searching or blasting RedoxDB, user can decide to included dataset(B) or not.

[db_redoxdb]
source_url = "https://biocomputer.bio.cuhk.edu.hk/RedoxDB/download/{{version}}"
version_available = ["redoxdb.A.txt", "redoxdb.B.txt", "redoxdb.A.fa", "redoxdb.B.fa"]

Configuration file: db_main.toml
Description: SM2miR, a manual curated database which collects and incorporates the experimentally validated small molecules' effects on miRNA expression in 20 species from the published papers. Each entry contains the detailed information about small molecules, miRNAs and their relationships, including species, small molecule name, DrugBank Accession number, PubChem CID, approved by FDA or not, miRNA name, miRBase Accession number, expression pattern of miRNA, experimental detection method, tissues or conditions for detection, evidences in the reference, PubMed ID and the published year of the reference.

[db_sm2mir]
source_url = "http://210.46.85.180:8080/sm2mir/files/{{versin}}.xls"
version_available = ["SM2miR3", "SM2miR2n", "SM2miR"]

Configuration file: db_main.toml
Description: HMDB is an online database of small molecule metabolites found in the human body, which facilitates human metabolomics research including the identification and characterization of human metabolites using NMR and MS.

[db_hmdb]
source_url = "http://www.hmdb.ca/system/downloads/current/{{version}}.zip"
avaliable_version = ["hmdb_proteins", "hmdb_metabolites", "structures"]

Done in commit fc7f22d.

configuration file: db_main.toml

title: AWESOME, a database of SNPs that affect protein post-translational modifications

description: Protein post-translational modifications (PTMs), including phosphorylation, ubiquitination, methylation, acetylation, glycosylation et al, are very important biological processes. PTM changes in some critical genes, which may be induced by base-pair substitution, are shown to affect the risk of diseases. Recently, large-scale exome-wide association studies found that missense single nucleotide polymorphisms (SNPs) play an important role in the susceptibility for complex diseases or traits. One of the functional mechanisms of missense SNPs is that they may affect PTMs and leads to a protein dysfunction and its downstream signaling pathway disorder. Here, we constructed a database named AWESOME (A Website Exhibits SNP On Modification Event, http://www.awesome-hust.com), which is an interactive web-based analysis tool that systematically evaluates the role of SNPs on nearly all kinds of PTMs based on 20 available tools. We also provided a well-designed scoring system to compare the performance of different PTM prediction tools and help users to get a better interpretation of results. Users can search SNPs, genes or position of interest, filter with specific modifications or prediction methods, to get a comprehensive PTM change induced by SNPs. In summary, our database provides a convenient way to detect PTM-related SNPs, which may potentially be pathogenic factors or therapeutic targets.

publication: AWESOME: a database of SNPs that affect protein post-translational modifications. Nucleic Acids Res. 2018 Sep 12. doi: 10.1093/nar/gky821.

[db_awesome]
source_url = "http://www.awesome-hust.com/downloads/{{version}}.zip"
version_available = ["awesomeAll"]

configuration file: db_main.toml

title: CellMarker: a manually curated resource of cell markers in human and mouse.

description: One of the most fundamental questions in biology is what types of cells form different tissues and organs in a functionally coordinated fashion. Larger-scale single-cell sequencing and biology experiment studies are now rapidly opening up new ways to track this question by revealing substantial cell markers for distinguishing different cell types in tissues. Here, we developed the CellMarker database (http://biocc.hrbmu.edu.cn/CellMarker/ or http://bio-bigdata.hrbmu.edu.cn/CellMarker/), aiming to provide a comprehensive and accurate resource of cell markers for various cell types in tissues of human and mouse. By manually curating over 100 000 published papers, 4124 entries including the cell marker information, tissue type, cell type, cancer information and source, were recorded. At last, 13 605 cell markers of 467 cell types in 158 human tissues/sub-tissues and 9148 cell makers of 389 cell types in 81 mouse tissues/sub-tissues were collected and deposited in CellMarker. CellMarker provides a user-friendly interface for browsing, searching and downloading markers of diverse cell types of different tissues. Furthermore, a summarized marker prevalence in each cell type is graphically and intuitively presented through a vivid statistical graph. We believe that CellMarker is a comprehensive and valuable resource for cell researches in precisely identifying and characterizing cells, especially at the single-cell level.

publication: CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res. 2018 Oct 5. doi: 10.1093/nar/gky900.

[db_cellmarker]
source_url = "http://biocc.hrbmu.edu.cn/CellMarker/download/{{version}}_cell_markers.txt"
version_available = ["all", "Human", "Mouse", "Single"]

configuration file: db_main.toml

title: LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases.

description: Mounting evidence suggested that dysfunction of long non-coding RNAs (lncRNAs) is involved in a wide variety of diseases. A knowledgebase with systematic collection and curation of lncRNA-disease associations is critically important for further examining their underlying molecular mechanisms. In 2013, we presented the first release of LncRNADisease, representing a database for collection of experimental supported lncRNA-disease associations. Here, we describe an update of the database. The new developments in LncRNADisease 2.0 include (i) an over 40-fold lncRNA-disease association enhancement compared with the previous version; (ii) providing the transcriptional regulatory relationships among lncRNA, mRNA and miRNA; (iii) providing a confidence score for each lncRNA-disease association; (iv) integrating experimentally supported circular RNA disease associations. LncRNADisease 2.0 documents more than 200 000 lncRNA-disease associations. We expect that this database will continue to serve as a valuable source for potential clinical application related to lncRNAs. LncRNADisease 2.0 is freely available at http://www.rnanut.net/lncrnadisease/.

publication: LncRNADisease 2.0: an updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 2018 Oct 4. doi: 10.1093/nar/gky905.

[db_lncrnadisease]
source_url = "http://www.rnanut.net/lncrnadisease/download/{{version}}.xlsx"
version_available = ["experimental%20circRNA-disease%20information", "experimental%20lncRNA-disease%20information", "predicted%20lncRNA-disease%20information", "all%20ncRNA-disease%20information"]

configuration file: db_main.toml

title: EWASdb: epigenome-wide association study database.

description: DNA methylation, the most intensively studied epigenetic modification, plays an important role in understanding the molecular basis of diseases. Furthermore, epigenome-wide association study (EWAS) provides a systematic approach to identify epigenetic variants underlying common diseases/phenotypes. However, there is no comprehensive database to archive the results of EWASs. To fill this gap, we developed the EWASdb, which is a part of 'The EWAS Project', to store the epigenetic association results of DNA methylation from EWASs. In its current version (v 1.0, up to July 2018), the EWASdb has curated 1319 EWASs associated with 302 diseases/phenotypes. There are three types of EWAS results curated in this database: (i) EWAS for single marker; (ii) EWAS for KEGG pathway and (iii) EWAS for GO (Gene Ontology) category. As the first comprehensive EWAS database, EWASdb has been searched or downloaded by researchers from 43 countries to date. We believe that EWASdb will become a valuable resource and significantly contribute to the epigenetic research of diseases/phenotypes and have potential clinical applications. EWASdb is freely available at http://www.ewas.org.cn/ewasdb or http://www.bioapp.org/ewasdb.

publication: EWASdb: epigenome-wide association study database. Nucleic Acids Res. 2018 Oct 13. doi: 10.1093/nar/gky942.

[db_ewasdb]
source_url = "http://www.bioapp.org/ewasdb/Public/file/{{version}}.rar"
version_available = ["ewas_singlemarker", "GO_Category", "KEGG_Pathway"]

configuration file: db_main.toml

title: CancerSplicingQTL: a database for genome-wide identification of splicing QTLs in human cancer.

description: Alternative splicing (AS) is a widespread process that increases structural transcript variation and proteome diversity. Aberrant splicing patterns are frequently observed in cancer initiation, progress, prognosis and therapy. Increasing evidence has demonstrated that AS events could undergo modulation by genetic variants. The identification of splicing quantitative trait loci (sQTLs), genetic variants that affect AS events, might represent an important step toward fully understanding the contribution of genetic variants in disease development. However, no database has yet been developed to systematically analyze sQTLs across multiple cancer types. Using genotype data from The Cancer Genome Atlas and corresponding AS values calculated by TCGASpliceSeq, we developed a computational pipeline to identify sQTLs from 9 026 tumor samples in 33 cancer types. We totally identified 4 599 598 sQTLs across all cancer types. We further performed survival analyses and identified 17 072 sQTLs associated with patient overall survival times. Furthermore, using genome-wide association study (GWAS) catalog data, we identified 1 180 132 sQTLs overlapping with known GWAS linkage disequilibrium regions. Finally, we constructed a user-friendly database, CancerSplicingQTL (http: //www.cancersplicingqtl-hust.com/) for users to conveniently browse, search and download data of interest. This database provides an informative sQTL resource for further characterizing the potential functional roles of SNPs that control transcript isoforms in human cancer.

publication: CancerSplicingQTL: a database for genome-wide identification of splicing QTLs in human cancer. Nucleic Acids Res. 2018 Oct 17. doi: 10.1093/nar/gky954.

[db_cancersplicingqtl]
source_url = "http://www.cancersplicingqtl-hust.com/downloads/{{version}}.xlsx"
version_available = ["ACC_sQTLs", "BLCA_sQTLs", "BRCA_sQTLs", "CESC_sQTLs", "CHOL_sQTLs", "COAD_sQTLs", "DLBC_sQTLs", "ESCA_sQTLs", "GBM_sQTLs", "HNSC_sQTLs", "KICH_sQTLs", "KIRC_sQTLs", "KIRP_sQTLs", "LAML_sQTLs", "LGG_sQTLs", "LIHC_sQTLs", "LUAD_sQTLs", "LUSC_sQTLs", "MESO_sQTLs", "OV_sQTLs", "PAAD_sQTLs", "PCPG_sQTLs", "PRAD_sQTLs", "READ_sQTLs", "SARC_sQTLs", "SKCM_sQTLs", "STAD_sQTLs", "TGCT_sQTLs", "THCA_sQTLs", "THYM_sQTLs", "UCEC_sQTLs", "UCS_sQTLs", "UVM_sQTLs", "ACC_Survival_sQTLs", "BLCA_Survival_sQTLs", "BRCA_Survival_sQTLs", "CESC_Survival_sQTLs", "CHOL_Survival_sQTLs", "COAD_Survival_sQTLs", "DLBC_Survival_sQTLs", "ESCA_Survival_sQTLs", "GBM_Survival_sQTLs", "HNSC_Survival_sQTLs", "KICH_Survival_sQTLs", "KIRC_Survival_sQTLs", "KIRP_Survival_sQTLs", "LAML_Survival_sQTLs", "LGG_Survival_sQTLs", "LIHC_Survival_sQTLs", "LUAD_Survival_sQTLs", "LUSC_Survival_sQTLs", "MESO_Survival_sQTLs", "OV_Survival_sQTLs", "PAAD_Survival_sQTLs", "PCPG_Survival_sQTLs", "PRAD_Survival_sQTLs", "READ_Survival_sQTLs", "SARC_Survival_sQTLs", "SKCM_Survival_sQTLs", "STAD_Survival_sQTLs", "TGCT_Survival_sQTLs", "THCA_Survival_sQTLs", "THYM_Survival_sQTLs", "UCEC_Survival_sQTLs", "UCS_Survival_sQTLs", "UVM_Survival_sQTLs", "ACC_GWAS_sQTLs", "BLCA_GWAS_sQTLs", "BRCA_GWAS_sQTLs", "CESC_GWAS_sQTLs", "CHOL_GWAS_sQTLs", "COAD_GWAS_sQTLs", "DLBC_GWAS_sQTLs", "ESCA_GWAS_sQTLs", "GBM_GWAS_sQTLs", "HNSC_GWAS_sQTLs", "KICH_GWAS_sQTLs", "KIRC_GWAS_sQTLs", "KIRP_GWAS_sQTLs", "LAML_GWAS_sQTLs", "LGG_GWAS_sQTLs", "LIHC_GWAS_sQTLs", "LUAD_GWAS_sQTLs", "LUSC_GWAS_sQTLs", "MESO_GWAS_sQTLs", "OV_GWAS_sQTLs", "PAAD_GWAS_sQTLs", "PCPG_GWAS_sQTLs", "PRAD_GWAS_sQTLs", "READ_GWAS_sQTLs", "SARC_GWAS_sQTLs", "SKCM_GWAS_sQTLs", "STAD_GWAS_sQTLs", "TGCT_GWAS_sQTLs", "THCA_GWAS_sQTLs", "THYM_GWAS_sQTLs", "UCEC_GWAS_sQTLs", "UCS_GWAS_sQTLs", "UVM_GWAS_sQTLs"]

configuration file: db_main.toml

title: The cancer precision medicine knowledge base for structured clinical-grade mutations and interpretations.

description:
OBJECTIVE:
This paper describes the Precision Medicine Knowledge Base (PMKB; https://pmkb.weill.cornell.edu ), an interactive online application for collaborative editing, maintenance, and sharing of structured clinical-grade cancer mutation interpretations.

MATERIALS AND METHODS:
PMKB was built using the Ruby on Rails Web application framework. Leveraging existing standards such as the Human Genome Variation Society variant description format, we implemented a data model that links variants to tumor-specific and tissue-specific interpretations. Key features of PMKB include support for all major variant types, standardized authentication, distinct user roles including high-level approvers, and detailed activity history. A REpresentational State Transfer (REST) application-programming interface (API) was implemented to query the PMKB programmatically.

RESULTS:
At the time of writing, PMKB contains 457 variant descriptions with 281 clinical-grade interpretations. The EGFR, BRAF, KRAS, and KIT genes are associated with the largest numbers of interpretable variants. PMKB's interpretations have been used in over 1500 AmpliSeq tests and 750 whole-exome sequencing tests. The interpretations are accessed either directly via the Web interface or programmatically via the existing API.

DISCUSSION:
An accurate and up-to-date knowledge base of genomic alterations of clinical significance is critical to the success of precision medicine programs. The open-access, programmatically accessible PMKB represents an important attempt at creating such a resource in the field of oncology.

CONCLUSION:
The PMKB was designed to help collect and maintain clinical-grade mutation interpretations and facilitate reporting for clinical cancer genomic testing. The PMKB was also designed to enable the creation of clinical cancer genomics automated reporting pipelines via an API.

publication: The cancer precision medicine knowledge base for structured clinical-grade mutations and interpretations. J Am Med Inform Assoc. 2017 May 1;24(3):513-519. doi: 10.1093/jamia/ocw148 (IF: 4.27).

[db_pmkb]
source_url = "https://pmkb.weill.cornell.edu/therapies/download.xlsx"
version_available = "latest"