- python 2.7.x (numpy, scipy, seaborn, pyBigWig, etc.)
- IEDB stand-alone 20130222 2.15.5 (2.22.1 is not fully tested)
- bedtools 2.29.0
- STAR 2.5.3: required for IRIS RNA-seq processing
- samtools 1.3: required for IRIS RNA-seq processing
- rMATS-turbo: required for IRIS RNA-seq processing
- Cufflinks 2.2.1: required for IRIS RNA-seq processing
- seq2HLA: required for HLA typing; requires bowtie
- MS GF+ (v2018.07.17): required for MS search; requiring Java
Two steps to set up IRIS:
The IRIS program can be downloaded directly from the repository, as shown below:
git clone https://github.com/Xinglab/IRIS.git
cd IRIS
For full functionality, IRIS requires use of the SGE system. For users who want to use functions involving SGE (see Usage for details), please check IRIS/config.py to ensure qsub parameters are correct before moving to the next step.
IRIS loads a big-data reference database of splicing events and other genomic annotations.
These data are included in IRIS_data.tgz (a Google Drive link; size ~10 GB). Users need to move this file to the IRIS folder for streamlined installation.
Download IEDB_MHC_I-X.XX.X.tar.gz from IEDB website (see Dependencies). Create a folder named 'IEDB' in the IRIS folder, then move the downloaded gz file to the 'IEDB' folder.
Under the IRIS folder, do:
./install core
Follow instructions to finish the installation of conda, python and its dependencies, bedtools, the downloaded IEDB package, and the IRIS data and packages. To install optional dependencies not needed for the most common IRIS usage:
./install all
- For streamlined AS-derived target discovery, please follow major modules and run the corresponding toy example.
- For customized pipeline development, please check all modules of IRIS.
IRIS provides individual modules/steps, allowing users to build pipelines for their customized needs.
For a description of each module/step, including RNA-seq preprocessing, HLA typing, proteo-transcriptomic MS searching, visualization, etc., please click here or the subheader above.
usage: IRIS [-h] [--version]
{formatting,screening,prediction,epitope_post,process_rnaseq,makeqsub_rmats,exp_matrix,indexing,translation,pep2epitope,screening_plot,seq2hla,parse_hla,ms_makedb,ms_search,ms_parse}
...
IRIS -- IRIS
positional arguments:
{formatting,screening,prediction,epitope_post,process_rnaseq,makeqsub_rmats,exp_matrix,indexing,translation,pep2epitope,screening_plot,seq2hla,parse_hla,ms_makedb,ms_search,ms_parse}
formatting Formats AS matrices from rMATS, followed by indexing for IRIS
screening Screens AS-derived tumor antigens using big-data reference
prediction Predicts and annotates AS-derived TCR (pre-prediction) and CAR-T targets
epitope_post Post-prediction step to summarize predicted TCR targets
process_rnaseq Processes RNA-Seq FASTQ files to quantify gene expression and AS
makeqsub_rmats Makes qsub files for running rMATS-turbo 'prep' step
exp_matrix Makes a merged gene expression matrix from multiple cufflinks results
indexing Indexes AS matrices for IRIS
translation Translates AS junctions into junction peptides
pep2epitope Wrapper to run IEDB for peptide-HLA binding prediction
screening_plot Makes stacked/individual violin plots for list of AS events
seq2hla Wrapper to run seq2HLA for HLA typing using RNA-Seq
parse_hla Summarizes seq2HLA results of all input samples into matrices for IRIS use
ms_makedb Generates proteo-transcriptomic database for MS search
ms_search Wrapper to run MSGF+ for MS search
ms_parse Parses MS search results to generate tables of identified peptides
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
For command line options of each sub-command, type: IRIS COMMAND -h
The common use of IRIS immunotherapy target discovery comprises three major steps. For a quick test, see Example, in which a shell script is provided for a streamlined example run:
- Step 1. IRIS formatting (& indexing)
usage: IRIS formatting [-h] -t {SE,RI,A3,A5} -n DATA_NAME -s {1,2}
[-c COV_CUTOFF] [-e] [-d IRIS_DB_PATH]
rmats_mat_path_manifest rmats_sample_order
- Step 2. IRIS screening (& translation) Here is a description of the parameter file and an example file.
usage: IRIS screening [-h] [-o OUTDIR] [-t] parameter_fin
- Step 3. IRIS prediction (predicts both extracellular targets and epitopes; requires SGE system)
usage: IRIS prediction [-h] [-p PARAMETER_FIN] [--iedb-local IEDB_LOCAL]
[-c DELTAPSI_COLUMN] [-d DELTAPSI_CUT_OFF] -m MHC_LIST
[--extracellular-anno-by-junction]
IRIS_screening_result_path
usage: IRIS epitope_post [-h] -p PARAMETER_FIN -o OUTDIR -m MHC_BY_SAMPLE
[-e GENE_EXP_MATRIX] [--ic50-cut-off IC50_CUT_OFF]
We provide a wrapper (run_example) to run the above IRIS streamlined major modules using example files, included in the IRIS package. For customized pipeline development, we recommend that users use this script and run_iris as a reference. Under the IRIS folder, do:
./run_example
As mentioned in Usage, this example run will involve submitting the job array to the SGE system. It will take < 5 min for the formatting and screening steps and usually < 15 min for the prediction step (SGE job arrays).
A successful test run will generate the following result files in ./results/example/Glioma_test/screening (row numbers are displayed before each file name):
0 _example_Glioma_test.notest.txt
13 _example_Glioma_test.primary.txt
3 _example_Glioma_test.primary.txt.ExtraCellularAS.txt
11 _example_Glioma_test.prioritized.txt
3 _example_Glioma_test.prioritized.txt.ExtraCellularAS.txt
13 _example_Glioma_test.test.all.txt
13 primary/epitope_summary.junction-based.txt
74 primary/epitope_summary.peptide-based.txt
148 primary/pred_filtered.score500.txt
11 prioritized/epitope_summary.junction-based.txt
45 prioritized/epitope_summary.peptide-based.txt
84 prioritized/pred_filtered.score500.txt
Users can refer to relative paths in the parameter file Test.para, the file manifest matrice.txt, and the file samples.txt. These relative paths were made for the example run. Users will need to change the path for their own analyses. The run_iris script takes as input a simplified parameter file and a .tar.gz of the SJ_matrices which are preprocessed before calling the IRIS modules. The preprocessing adds absolute paths based on the input relative paths.
Final reports are shown in bold font.
[TASK/DATA_NAME].test.all.txt: All AS events tested by IRIS screening
[TASK/DATA_NAME].notest.txt: During screening, AS events skipped due to no variance or no available comparisons
[TASK/DATA_NAME].primary.txt: Tumor AS events after comparison to tissue-matched normal panel ('primary' events)
[TASK/DATA_NAME].prioritized.txt: Tumor AS events after comparison to tissue-matched normal panel, tumor panel, and normal tissue panel ('prioritized' AS events)
[TASK/DATA_NAME].primary.txt.ExtraCellularAS.txt: Tumor AS events in 'primary' set that are associated with protein extracellular annotation and may be used for CAR-T targets
[TASK/DATA_NAME].prioritized.txt.ExtraCellularAS.txt: Tumor AS events in 'prioritized' set that are associated with protein extracellular annotation and may be used for CAR-T targets
primary/pred_filtered.score500.txt: IEDB prediction outputs for AS junction peptides from 'primary' set with HLA-peptide binding IC50 values passing user-defined cut-off
primary/epitope_summary.peptide-based.txt: AS-derived epitopes from 'primary' set that are predicted to bind user-defined HLA type
primary/epitope_summary.junction-based.txt: Epitope-producing AS junctions from 'primary' set that are predicted to bind user-defined HLA type
prioritized/pred_filtered.score500.txt: IEDB prediction outputs for AS junction peptides from 'prioritized' set with HLA-peptide binding IC50 value passing user-defined cut-off
prioritized/epitope_summary.peptide-based.txt: AS-derived epitopes from 'prioritized' set that are predicted to bind user-defined HLA type
prioritized/epitope_summary.junction-based.txt: Epitope-producing AS junctions from 'prioritized' set that are predicted to bind user-defined HLA type
Yang Pan panyang@ucla.edu
Yi Xing yxing@ucla.edu
Manuscript in submission