Web server: We have a web server now! It supports light-weight analysis and various interactive visualizations (scatter plot, hive plot, etc.), check it out by clicking here!
The PEGASAS package (including a toy example and data) can be downloaded and installed as shown below:
git clone https://github.com/Xinglab/PEGASAS.git
cd PEGASAS
python setup.py install
Note that the installation process will only automatically check and install python package dependencies. If the R packages required for PEGASAS are missing, they can only be installed manually. See next section for required packages.
python version 2.7 (numpy, scipy, matplotlib)
R version 3.4.0 (LSD, data.table, ggplot2)
After installing PEGASAS and its dependencies, the user can follow the two steps below to perform the analysis and to generate plots for correlation and Gene Ontology (GO) analysis. (A toy example is provided for a test run. Corresponding commands are provided in the next section.)
There are two steps to perform PEGASAS analysis, as shown below (typing PEGASAS -h in the command line):
usage: PEGASAS [-h] [--version] {pathway,correlation} ...
PEGASAS -- PEGASAS
positional arguments:
{pathway,correlation}
pathway Calculates signaling pathway activity derived from
geneset enrichment metric based on RNA-Seq gene
expression
correlation Computes pathway-correlated alternative splicing
events
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
For command line options of each sub-command, type: PEGASAS COMMAND -h
PEGASAS can calculate the signaling pathway activity based on predefined gene signatures and gene expression. For details of this step, see below:
PEGASAS pathway -h
usage: PEGASAS pathway [-h] [-o OUT_DIR] [-n NUM_INTERVAL] [--plotting]
geneExpbySample geneSignatureList groupInfo
required arguments:
geneExpbySample TSV format matrix of gene expression values (FPKM,
TPM, etc.), where each row is one sample and each
column is one gene
geneSignatureList One or multiple gene signature sets from pathway of
interest, in the 'gmt' format (see MSigDB webset)
groupInfo TSV format file, providing patient ID and
phenotype/disease stage in each row
optional arguments:
-h, --help show this help message and exit
-o OUT_DIR, --out-dir OUT_DIR
Name of folder for analysis output
-n NUM_INTERVAL, --num-interval NUM_INTERVAL
Number of parallel processes for KS enrichment calculation
--plotting Makes plots to inspect K-S enrichment scores
Note: About input geneSignatureList format: PEGASAS can take multipel gene signatures in one file, which requries a minor modification of gmt (full name of the signature after '>' is removed). See example file under data folder for details.
PEGASAS can perform correlation analysis to identify pathway-associated events from the pathway acitivity measurements generated in Step 1 and alternative splicing(or editing, etc.) events. For details of this step, see below:
PEGASAS correlation -h
usage: PEGASAS correlation [-h] [-o OUT_DIR] [--GO] [--GO-only]
[-b GO_BACKGROUND_GENE_LIST]
[--GOdb-versions GODB_VERSIONS]
signatureScorebySample PSIbySample groupNameOrder
required arguments:
signatureScorebySample
A TSV format list of gene signature score where each
column is one sample and the corresponding score.
PSIbySample A TSV format matrix of PSI values where each column is
one sample and each row is one splicing event.
groupNameOrder A file contains a comma-separated string of group name
orders. The group name should match group info list in
the pathway score calculation step. This is useful for
the heatmap visualization.
optional arguments:
-h, --help show this help message and exit
-o OUT_DIR, --out-dir OUT_DIR
Output folder name of the analysis.
--GO Perform GO analysis.
--GO-only Only perform GO analysis. Needs to provide background
gene list for p-value calculation.
-b GO_BACKGROUND_GENE_LIST, --GO-background-gene-list GO_BACKGROUND_GENE_LIST
Provides background gene list for GO analysis bias
correction. This background list should contain genes
involved in the splicing analysis. Required under GO-
only mode.
--GOdb-versions GODB_VERSIONS
Specify EnrichR GO db versions. Default is the 2017
version used in our PNAS paper PMID: 32086391
Note: About the input PSIbySample TSV format: first 8 columns are used to define splicing events using gene name, AC, chrome, strand, and key coordinates following rMATS output format. See toy example for a concrete example.
Here are commands for a test run using toy example files provided in the example folder in the package.
Go to PEGASAS folder:
cd PEGASAS
Use hallmarks50-2.gmt.txt as the signature file. This file only contains two gene signatures:
PEGASAS pathway -o test example/geneExpbySample_example.txt PEGASAS/data/hallmarks50-2.gmt.txt example/groupInfo_example.txt
Use the HALLMARK_MYC_TARGETS_V2 signature activity generated in the last step to perform the correlation analysis:
PEGASAS correlation -o test --GO test/HALLMARK_MYC_TARGETS_V2/HALLMARK_MYC_TARGETS_V2.scores.txt example/PSIbySample_example.txt example/groupNameOrder_example.txt
Results can be found under the 'test' folder:
4.0K GO/
40K HALLMARK_MYC_TARGETS_V2_background_list.txt
924K HALLMARK_MYC_TARGETS_V2_global_cor_matrix.txt
3.1K HALLMARK_MYC_TARGETS_V2_high_cor_matrix.txt
56K HALLMARK_MYC_TARGETS_V2_high_cor_scatterplots.pdf
4.0K HALLMARK_MYC_TARGETS_V2.pathway/
1.3K HALLMARK_MYC_TARGETS_V2.scores.txt
241 HALLMARK_MYC_TARGETS_V2_sig_list.txt
579 HALLMARK_MYC_TARGETS_V2.sorted.txt
2.4M refinedBySample.PSIbySample_example.HALLMARK_MYC_TARGETS_V2.sorted.txt
HALLMARK_MYC_TARGETS_V2_high_cor_matrix.txt: Pathway-associated events with Pearson's r and permutation p-value.
Yang Pan panyang@ucla.edu
Yi Xing XINGYI@email.chop.edu
Phillips J.W.*, Pan Y.*, Tsai B.L., Xie Z., Demirdjian L., Xiao W., Yang H.T., Zhang Y., Lin C.H., Cheng D., Hu Q., Liu S., Black D.L., Witte O.N.+, Xing Y.+ Pathway-guided analysis reveals Myc-dependent alternative pre-mRNA splicing in aggressive prostate cancers. Proc. Natl. Acad. Sci. U.S.A., (2020) In Press (+ joint corresponding authors; * joint first authors)