Authors: | Komal S. Rathi
Adam Kraya Run Jin |
---|---|
Contact: | RATHIK@chop.edu, KRAYAA@chop.edu, JINR@chop.edu |
Organization: | D3B, CHOP |
Status: | This is "work in progress" |
Date: | 2024-09-18 |
This repo is obsolete. Please refer to https://github.com/d3b-center/d3b-patient-report-analysis for up-to-date reporting code.
- Clone the OMPARE repository.
- Install R packages:
# install packages
cd /path-to/OMPARE
Rscript code/utils/install_pkgs.R
# NOTE: ggnetwork v0.5.1 is required
- Download reference data:
# get reference data from s3
aws s3 --profile Mgmt-Console-Dev-chopd3bprod@684194535433 sync s3://d3b-bix-dev-data-bucket/PNOC008/data /path-to/OMPARE/data/
# chembldb v29 needs to be downloaded separately
# download chembldb v29 from https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_29_sqlite.tar.gz and save under data/chembl directory, then untar it. You will find the database under the path: data/chembl/chembl_29_sqlite/chembl_29.db.
- Download patient-specific files from data delivery project:
- Copy Number:
{uuid}.controlfreec.info.txt
(for purity and ploidy){uuid}.gainloss.txt
{uuid}.diagram.pdf
- Expression:
{uuid}.genes.results
- Fusions:
{uuid}.arriba.fusions.tsv
{uuid}.STAR.fusion_predictions.abridged.coding_effect.tsv
- Somatic Variants:
{uuid}.{lancet, mutect2, strelka2, vardict}_somatic.norm.annot.protected.maf
{uuid}.consensus_somatic.protected.maf
- Germline Variants:
{uuid}.gatk.PASS.vcf.gz.hg38_multianno.txt.gz
- Download the following clinical information files and add data manually to
data/manifest/manifest.xlsx
- Files from Kids First DRC
- PNOC008 Clinical Manifest (needed to map
Research ID
to ADAPTcohort_participant_id
)
- PNOC008 Clinical Manifest (needed to map
- Currently, we have switched to using data assembly histology file available with each new patient: https://cavatica.sbgenomics.com/u/kfdrc-harmonization/sd-8y99qzjj-data-assembly/
Note: None of these files have information on short_histology or broad_histology so currently it is being hard-coded HGAT
and Diffuse astrocytic and oligodendroglial tumor
, respectively.
run_OMPARE.R: Master script that runs the following scripts:
- code/create_project_dir.R: create project directory and organize files.
- code/create_clinfile.R: create clinical file for patient of interest.
- code/update_pnoc008_matrices.R: update PNOC008 data matrices (cnv, mutations, fusions, expression) with each new patient.
- OMPARE.Rmd: run html reports
- Sync back updated data folder to s3:
aws s3 --profile Mgmt-Console-Dev-chopd3bprod@684194535433 sync data s3://d3b-bix-dev-data-bucket/PNOC008/data/ --exclude 'chembl/*'
- upload_reports.R: upload reports and output folders to PNOC008 data delivery project on cavatica.
Options:
--patient=PATIENT
Patient identifier (PNOC008-22, C3342894...)
--source_dir=SOURCE_DIR
Source directory with all files
--clin_file=CLIN_FILE
Manifest file (.xlsx)
--sync_data=SYNC_DATA
Sync reference data to s3 (TRUE or FALSE)
--upload_reports=UPLOAD_REPORTS
Upload reports to cavatica (TRUE or FALSE)
--study=STUDY
Study ID (PNOC008 or CBTN)
# Example for patient PNOC008-40
Rscript run_OMPARE.R \
--patient PNOC008-40 \
--sourcedir ~/Downloads/p40 \
--clin_file data/manifest/pnoc008_manifest.xlsx \
--sync_data TRUE \
--upload_reports FALSE \
--study PNOC008
code/create_project_dir.R: this script creates and organizes input files under results
. Creates output
folder to store all output for plots and tables reported and reports
folder to store all html output.
Rscript code/create_project_dir.R --help
Options:
--sourcedir=SOURCEDIR
Source directory with all files
--destdir=DESTDIR
Destination directory.
# Example for patient PNOC008-40
Rscript code/create_project.R \
--sourcedir ~/Downloads/p40 \
--destdir /path-to/OMPARE/results/PNOC008-40
code/create_clinfile.R: this script creates clinical file for patient of interest and stores under results/PNOC008-XX/clinical/
.
Rscript code/create_clinfile.R --help
Options:
--sheet=SHEET
PNOC008 Manifest file (.xlsx)
--dir=DIR
Path to PNOC008 patient folder.
--patient=PATIENT
Patient identifier for PNOC008. e.g. PNOC008-1, PNOC008-10 etc
# Example for patient PNOC008-40
Rscript code/create_clinfile.R \
--sheet /path-to/OMPARE/data/manifest/pnoc008_manifest.xlsx \
--patient PNOC008-40 \
--dir /path-to/OMPARE/results/PNOC008-40
NOTE: The above steps will create a directory structure for the patient of interest:
# Example for PNOC008-40
.
results/PNOC008-40
├── clinical
│ └── patient_report.txt
├── copy-number-variations
│ ├── {uuid}.controlfreec.info.txt
│ ├── {uuid}.diagram.pdf
│ └── {uuid}.gainloss.txt
├── gene-expressions
│ └── {uuid}.rsem.genes.results.gz
├── gene-fusions
│ ├── {uuid}.STAR.fusion_predictions.abridged.coding_effect.tsv
│ └── {uuid}.arriba.fusions.tsv
├── output
├── reports
└── simple-variants
├── {uuid}.lancet_somatic.norm.annot.protected.maf
├── {uuid}.mutect2_somatic.norm.annot.protected.maf
├── {uuid}.strelka2_somatic.norm.annot.protected.maf
├── {uuid}.vardict_somatic.norm.annot.protected.maf
├── {uuid}.consensus_somatic.protected.maf
└── {uuid}.gatk.PASS.vcf.gz.hg38_multianno.txt.gz
code/update_pnoc008_matrices.R: this script updates the 008 patient matrices (cnv, mutations, fusions, expression) by adding current patient of interest
Rscript code/update_pnoc008_matrices.R
# Running the script will update the following files:
data/pnoc008
├── pnoc008_clinical.rds
├── pnoc008_cnv_filtered.rds
├── pnoc008_consensus_mutation_filtered.rds
├── pnoc008_counts_matrix.rds
├── pnoc008_fpkm_matrix.rds
├── pnoc008_fusions_filtered.rds
├── pnoc008_tmb_scores.rds
├── pnoc008_tpm_matrix.rds
└── pnoc008_vs_gtex_brain_degs.rds
Generate markdown report:
# patient_dir is the project directory of current patient
# set_title is the title for the report. (Optional)
# snv_pattern is one of the six values for simple variants: lancet, mutect2, strelka2, vardict, consensus, all (all four callers together)
Rscript -e "rmarkdown::render(input = 'OMPARE.Rmd',
params = list(patient_dir = patient_dir,
set_title = set_title,
snv_caller = snv_caller),
output_dir = output_dir,
intermediates_dir = output_dir,
output_file = output_file, clean = TRUE)"
After running the reports, the project folder will have all output files with plots and tables under output
and all html reports under reports
:
.
├── drug_recommendations
│ ├── CEMITools
│ │ ├── beta_r2.pdf
│ │ ├── clustered_samples.rds
│ │ ├── diagnostics.html
│ │ ├── enrichment_es.tsv
│ │ ├── enrichment_nes.tsv
│ │ ├── enrichment_padj.tsv
│ │ ├── expected_counts_corrected.rds
│ │ ├── gsea.pdf
│ │ ├── hist.pdf
│ │ ├── hubs.rds
│ │ ├── interaction.pdf
│ │ ├── interactions.tsv
│ │ ├── mean_k.pdf
│ │ ├── mean_var.pdf
│ │ ├── module.tsv
│ │ ├── modules_genes.gmt
│ │ ├── ora.pdf
│ │ ├── ora.tsv
│ │ ├── parameters.tsv
│ │ ├── profile.pdf
│ │ ├── qq.pdf
│ │ ├── report.html
│ │ ├── sample_tree.pdf
│ │ ├── selected_genes.txt
│ │ ├── summary.rds
│ │ ├── summary_eigengene.tsv
│ │ ├── summary_mean.tsv
│ │ ├── summary_median.tsv
│ │ ├── umap_output.rds
│ │ └── umap_top_20_neighbors_output.rds
│ ├── GTExBrain_dsea_go_mf_output.html
│ ├── GTExBrain_dsea_go_mf_output.pdf
│ ├── GTExBrain_dsea_go_mf_output.txt
│ ├── GTExBrain_dsea_go_mf_output_files
│ ├── GTExBrain_qSig_output.txt
│ ├── GTExBrain_tsea_reactome_output.txt
│ ├── PBTA_ALL_dsea_go_mf_output.html
│ ├── PBTA_ALL_dsea_go_mf_output.pdf
│ ├── PBTA_ALL_dsea_go_mf_output.txt
│ ├── PBTA_ALL_dsea_go_mf_output_files
│ ├── PBTA_ALL_qSig_output.txt
│ ├── PBTA_ALL_tsea_reactome_output.txt
│ ├── PBTA_HGG_dsea_go_mf_output.html
│ ├── PBTA_HGG_dsea_go_mf_output.pdf
│ ├── PBTA_HGG_dsea_go_mf_output.txt
│ ├── PBTA_HGG_dsea_go_mf_output_files
│ ├── PBTA_HGG_qSig_output.txt
│ ├── PBTA_HGG_tsea_reactome_output.txt
│ ├── {patient_id}_CHEMBL_drug-gene.tsv
│ ├── drug_dge_density_plots
│ │ ├── {gene}_drug_dge_density_plots.png
│ │ └── top_drug_dge_density_plots.pdf
│ ├── drug_pathways_barplot.pdf
│ ├── ora_plots.pdf
│ └── transcriptome_drug_rec.rds
├── drug_synergy
│ ├── combined_qSig_synergy_score.tsv
│ ├── combined_qSig_synergy_score_top10.pdf
│ ├── gtex_qSig_subnetwork_drug_gene_map.tsv
│ ├── gtex_qSig_synergy_score.tsv
│ ├── pbta_hgg_qSig_subnetwork_drug_gene_map.tsv
│ ├── pbta_hgg_qSig_synergy_score.tsv
│ ├── pbta_qSig_subnetwork_drug_gene_map.tsv
│ ├── pbta_qSig_synergy_score.tsv
│ ├── subnetwork_gene_drug_map.tsv
│ └── subnetwork_genes.tsv
├── filtered_germline_vars.rds
├── genomic_landscape_plots
│ └── circos_plot.png
├── immune_analysis
│ ├── immune_scores_adult.pdf
│ ├── immune_scores_adult.rds
│ ├── immune_scores_pediatric.pdf
│ ├── immune_scores_pediatric.rds
│ ├── immune_scores_topcor_pediatric.pdf
│ ├── immune_scores_topcor_pediatric.rds
│ ├── tis_scores.pdf
│ └── tis_scores.rds
├── oncogrid_analysis
│ └── complexheatmap_oncogrid.pdf
├── oncokb_analysis
│ ├── oncokb_cnv.txt
│ ├── oncokb_cnv_annotated.txt
│ ├── oncokb_fusion.txt
│ ├── oncokb_fusion_annotated.txt
│ ├── oncokb_{snv_caller}_annotated.txt
│ ├── oncokb_merged_{snv_caller}_annotated.txt
│ └── oncokb_merged_{snv_caller}_annotated_actgenes.txt
├── rnaseq_analysis
│ ├── {patient_id}_summary_DE_Genes_Down.txt
│ ├── {patient_id}_summary_DE_Genes_Up.txt
│ ├── {patient_id}_summary_Pathways_Down.txt
│ ├── {patient_id}_summary_Pathways_Up.txt
│ ├── diffexpr_genes_barplot_output.rds
│ ├── diffreg_pathways_barplot_output.rds
│ └── rnaseq_analysis_output.rds
├── survival_analysis
│ ├── kaplan_meier_adult.pdf
│ └── kaplan_meier_pediatric.pdf
├── tmb_analysis
│ ├── consensus_mpf_output.txt
│ ├── tmb_profile_output.rds
│ └── tumor_signature_output.rds
└── transcriptomically_similar_analysis
├── dim_reduction_plot_adult.rds
├── dim_reduction_plot_pediatric.rds
├── lollipop_recurrent_adult.pdf
├── lollipop_recurrent_pediatric.pdf
├── lollipop_shared_adult.pdf
├── lollipop_shared_pediatric.pdf
├── mutational_analysis_adult.rds
├── mutational_analysis_pediatric.rds
├── mutational_cnv_recurrent_adult.pdf
├── mutational_cnv_recurrent_pediatric.pdf
├── mutational_cnv_shared_adult.pdf
├── mutational_cnv_shared_pediatric.pdf
├── mutational_recurrent_adult.pdf
├── mutational_recurrent_pediatric.pdf
├── mutational_shared_adult.pdf
├── mutational_shared_pediatric.pdf
├── pathway_analysis_adult.pdf
├── pathway_analysis_adult.rds
├── pathway_analysis_pediatric.pdf
├── pathway_analysis_pediatric.rds
├── pbta_hgat_pnoc008_nn_table.rds
├── pbta_hgat_pnoc008_umap_output.rds
├── pbta_pnoc008_nn_table.rds
├── pbta_pnoc008_umap_output.rds
├── ssgsea_scores_pediatric.pdf
├── ssgsea_scores_pediatric.rds
├── tcga_gbm_pnoc008_nn_table.rds
├── tcga_pnoc008_umap_output.rds
├── transciptomically_similar_adult.rds
└── transciptomically_similar_pediatric.rds
upload_reports.R: this script uploads the files under reports
and output
folders to the data delivery project folder on cavatica.
Rscript upload_reports.R --help
Options:
--patient=PATIENT
Patient Identifier (PNOC008-22, etc...)
--study=STUDY
PNOC008 or CBTN
# Example run for PNOC008-40
Rscript upload_reports.R \
--patient PNOC008-40 \
--study 'PNOC008'
These hgg-dmg files are 20201202-data
version dependent:
hgg-dmg-integration
└── 20201202-data
├── CC_based_heatmap_Distance_euclidean_finalLinkage_average_clusterAlg_KM_expct_counts_VST_cluster_and_annotation.tsv
├── pbta-hgat-dx-prog-pm-gene-counts-rsem-expected_count-uncorrected.rds
└── pbta-histologies.tsv