/PIGNON

Primary LanguageJava

PIGNON

PIGNON is a protein-protein interaction (PPI)-guided functional enrichment analysis for quantitative proteomics. This algorithm measures the clustering of proteins with a shared Gene Ontology (GO) annotation within the provided PPI network weighted with quantitative proteomics data. The significance of this clustering measure is then estimated from a normal distribution approximated from a Monte Carlo Sampling Distribution. To correct for multiple hypothesis testing, we assess the false discovery rate at various thresholds against a null model. We tested PIGNON using a breast cancer dataset generated by Tyanova et al.

PIGNON is a Java application that can be run from the command line. You will need to download the PIGNON.jar file.

Note: We recommend running a first instance of PIGNON on your chosen PPI network with your quantitative data and running a second instance without the quantitative data in order to eliminate results that are significant only due to the innate network topology.

Dependencies

File Descriptions

Required input files

Examples files can be found under: input_files

1. Protein-protein interaction network (either BioGRID or STRING PPI network format unzipped)

Example BioGRID repository : BIOGRID-ORGANISM-Homo_sapiens-3.4.161.tab2.txt

PIGNON is currently set up to run on either the BioGRID or STRING networks. In the params file: you will need to specify the network type either BioGRID (0) or STRING (1) and the taxonomy ID of the species eg. human (9606).

For an alternative PPI network, you can format your network as a tab delimited file where each row is an interaction formatted as specified below. In the params file: you will need to specify the network type either BioGRID (0), you should leave the taxonomy ID blank.

#####Required format (note: the column numbers of the required information are in italics, the other columns can be blank)

2
EntrezID 1
3
EntrezID 2
8
HGNC symbol 1
9
HGNC symbol 2
16
SpeciesID 1
17
SpeciesID 2
6416 2318 MAP2K4 FLNC 9606 9606

2. String ID to Entrez ID mapping file (required to run STRING network)

Example mapping file : mapStringProteins_9606.v11.tsv

This tab-delimited text file was generated by combining the STRING accessory files 9606.protein.info.v11.0.gz and human.entrez_2_string.2018.tsv.gz. The mapping file is formatted as follows:

hgnc_symbol protein_external_id entrezID
ARF5 9606.ENSP00000000233 381

3. Propagated Gene Ontology terms

Example functional annotation file : GO_annotations-9606-inferred-allev-2.tsv

PIGNON is currently set up to run using only this type of annotation file.

Alternatively you can format your annotations as a tab delimited file where every row is a new annotation. Required information :

  • column 1: Annotation ID
  • column 2: Annotation Name (can be blank)
  • column 3: Annotation descriptor (can be left blank)
  • column 7: List of EntrezGene IDs, where elements are separated by a pipe (|)
  • column 8: List of HGNC symbols, where elements are separated by a pipe (|)
1
AnnotationID
2
Annotation Name
3
Annotation descriptor
7
EntrezGene IDs
8
hgnc_symbols
GO:0000015 phosphopyruvate hydratase complex cellular_component 2023|2026|2027|387712 ENO1|ENO2|ENO3|ENO4

4. Quantitative proteomics dataset

OPTION 1: Protein quanfication accross multiple samples

Example quantitative proteomics dataset : formatted-BreastCancerProteinExpression.txt

This is a tab delimited text file where each row represents the quantitative information for a given gene/protein in 2 or more conditions. Required information:

  • column 1: HGNC_symbol
  • columns 2-n: quantified values in the 2 studied conditions. It is important that the column labels for each condition corresponds to the labels identified in the params file (ie. if in the params file condition1 = Her2, in this file all columns for condition1 should be labelled Her2.n). The order of the columns is not important.

Any missing values should be represented by NA.

HGNC_symbol Condition1.1 Condition1.2 Condition2.1 Condition2.2 ConditionX.n
STARD13 2.5 1.8 0.7 NA ...
OPTION 2: Protein quantification provided as a fold-change between 2 samples

This is a tab delimited text file where each row represent the quantitative information for a given gene/protein accross 2 conditions. Required information:

  • column 1 : HGNC_symboll
  • column 2 : fold-change of protein accross 2 conditions of interest
HGNC_symbol Fold-change
STARD13 0.432

5. Params file

Example params file : params.txt

A template of this text file must be used to run PIGNON. It is passed to the program as command line argument.

It is important to specify the working directory, file paths and the proper parameters. A detailed explanation of these parameters can be found here.

Intermediate files PIGNON will generate

Note: These files will be generated in a sub-directory of your working directory labelled IO_files which will be automatically generated by the program

  • Initial distance matrix
  • Final distance matrix of fully connected component
  • Monte Carlo distribution (generated in a sub-directory mcDistribution automatically generated by the program)
  • Normal Distribution parameters calculated from the Monte Carlo Distribution File
  • shuffle Gene Ontology set

Files PIGNON will output

Note: These files will be generated in a sub-directory of your working directory labelled output_files which will be automatically generated by the program

  • false discovery rates at significant thresholds mapping (.tsv) : this file should be used to identify an FDR cut-off
  • Stats summary of tested GO terms (.tsv)
  • Detailed results for every GO annotation (.tsv): this file contains the biological results of interest to users

To run PIGNON

Note: we recommend running PIGNON on a computer with a minimum of 8GB of RAM. The program can run for up to 24hrs.

  1. Download the PIGNON.jar file.

  2. Prepare your input files as specified above.

  3. Open a terminal (mac/linux) or command prompt (windows) and navigate to where PIGNON.jar is stored.

  4. Enter command:

    java -Xmx8g -jar PIGNON.jar file/path/params.txt