MatchEngine
The matchengine matches patient clinical and genomic information to trials.
Built with
All required python libraries can be installed by running pip install -r requirements.txt
User Guide
Step 1: Set up MongoDB
The matchengine was initially developed using MongoDB version 3.2. For MongoDB installation instructions for Linux, Mac OS X, and Windows please visit their installation page.
Step 2: Load data
Patient data
The matchengine expects patient data to be stored in two separate MongoDB collections:
- clinical: Contains clinical attributes like cancer diagnosis and age (see examples/clinical.example.bson for an example)
MRN | SAMPLE_ID | ONCOTREE_PRIMARY_DIAGNOSIS_NAME | BIRTH_DATE | VITAL_STATUS | GENDER |
---|---|---|---|---|---|
01 | SAMPLE-01 | Breast Invasive Ductal Carcinoma | 1900-01-01 | alive | female |
- genomic: Contains all genomic variants sequenced from each patient (see examples/genomic.example.csv for an example)
SAMPLE_ID | TRUE_HUGO_SYMBOL | TRUE_PROTEIN_CHANGE | TRUE_VARIANT_CLASSIFICATION | VARIANT_CATEGORY | CNV_CALL | TRUE_TRANSCRIPT_EXON | WILDTYPE |
---|---|---|---|---|---|---|---|
SAMPLE-01 | PIK3CA | p.H1047R | Missense_Mutation | MUTATION | 8 | false |
Clinical and genomic files can be imported to MongoDB using the matchengine in CSV, PKL, and JSON format. MongoDB will store these collections in JSON format and is able to export the files again in BSON, JSON, and CSV format. For more information see mongodump and mongoexport
Trial data
The matchengine expects trial data to also be stored in a separate MongoDB collection. Matching information is stored in a nested structure under the root field name "treatment_list". Trials can be imported to MongoDB using the matchengine in YML or JSON format. In YML format, an example of the trial structure would be:
protocol_no: 00-000
nct_id: NCT000
treatment_list:
step:
- arm:
- arm_code: A
arm_description: 'Example Arm A'
arm_internal_id: 1
arm_suspended: N
dose_level: []
match:
- and:
- clinical:
oncotree_primary_diagnosis: Breast
age_numerical: '>=18'
- or:
- genomic:
hugo_symbol: PIK3CA
variant_category: Mutation
protein_change: p.H1047R
- genomic:
hugo_symbol: TP53
variant_category: Mutation
There are several genomic variants that can be curated in this way. Beneath is a map detailing how the trial field names correspond to the patient data field names:
trial field name | genomic field name | example |
---|---|---|
hugo_symbol | TRUE_HUGO_SYMBOL | ERBB2 |
protein_change | TRUE_PROTEIN_CHANGE | p.T790M |
wildcard_protein_change | TRUE_PROTEIN_CHANGE | p.G719 |
variant_classification | TRUE_VARIANT_CLASSIFICATION | In_Frame_Del |
variant_category | VARIANT_CATEGORY | Mutation |
exon | TRUE_TRANSCRIPT_EXON | 10 |
cnv_call | CNV_CALL | Heterozygous deletion |
wildtype | WILDTYPE | True or False |
trial field name | clinical field name | example |
---|---|---|
oncotree_diagnosis | ONCOTREE_PRIMARY_DIAGNOSIS_NAME | Breast Invasive Ductal Carcinoma |
age_numerical | BIRTH_DATE | 1900-01-01 |
variant_classification options:
- Missense_Mutation
- In_Frame_Del
- Nonsense_Mutation
- Splice_Region
- Frame_Shift_Del
- Splice_Site
- In_Frame_Ins
variant_category options:
- Mutation
- Copy Number Variation
- Structural Variation
- Signature
cnv_call options (for '''variant_category: Copy Number Variation''' only)
- Heterozygous deletion
- Homozygous deletion
- Gain
- High level amplification
Our example
To import example data run:
python matchengine.py load -t examples/trial.example.yml -c examples/clinical.example.csv -g examples/genomic.example.csv --mongo-uri ${your_mongo_uri}
- By default,
load
inserts the data into a database namedmatchminer
. - For more information on linking your Mongo URI please see these docs.
For default mongo shell configurations this will likely be
mongodb://localhost:27017
- Default trial file format is YML. To change this specify
--trial-format {yml,json,bson}
- Default clinical file format is CSV. To change this specify
--trial-format {csv,pkl,bson}
Step 2: Matching
Once your MongoDB is set up you can perform matching by running:
python matchengine.py match --mongo-uri ${your_mongo_uri}
Default output will be a csv file called "results.csv" in your current working directory.
You can specify the outpath path and filename of the results by setting the -o
flag.
NOTE: If using -o
, please specify output directory and filename.
You can change the file format of the output to JSON by setting the --json
flag.
Unit testing
The matchengine uses nose for unit testing. To run all tests from the repository's root directory:
nosetests tests
Authors
- Zachary Zwiesler
- Priti Kumari
- James Lindsay