PINTS is available on PyPI and bioconda, which means you can install PINTS easily with:
pip install pyPINTS
or
conda install bioconda::pypints
Alternatively, you can clone this repo to a local directory, then run the following command in that directory:
python setup.py install
PINTS can call peaks from either bigWig or BAM files. If you have signals for the forward and reverse strands in
two separate bigWig files (path_to_pl.bw
and path_to_mn.bw
), you can use command like the following to get the peaks:
pints_caller --save-to output_dir --file-prefix output_prefix --bw-pl path_to_pl.bw --bw-mn path_to_mn.bw --thread 16
To call peaks from BAM files:
you'll need to provide PINTS a path to the BAM file and what kind of experiment it was from.
If it's from a standard protocol, like PROcap, then you can set --exp-type PROcap
.
Other supported experiments including GROcap/
CoPRO/
csRNAseq/
NETCAGE/
CAGE/
RAMPAGE/
STRIPEseq. For a comprehensive list of directly supported assays, please run
pints_caller --help
If the data was generated by other methods, you need to tell PINTS where it can find ends of RNAs you are interested in.
For example, --exp-type R_5
tells the tool that:
- this alignment is from a single-end library;
- the tool should look at 5' of reads. Other supported values are
R_3
,R1_5
,R1_3
,R2_5
,R2_3
.
If reads represent the reverse complement of original RNAs, like PROseq, then you need to use --reverse-complement
(not necessary for standard protocols).
One example for calling peaks from BAM file:
pints_caller --bam-file input.bam --save-to output_dir --file-prefix output_prefix --thread 16 --exp-type PROcap
We have prepared several case studies demonstrating steps from processing the raw fastq files to calling peaks/TREs for your reference.
- prefix+
_{SID}_divergent_peaks.bed
: Divergent TREs; - prefix+
_{SID}_bidirectional_peaks.bed
: Bidirectional TREs (divergent + convergent); - prefix+
_{SID}_unidirectional_peaks.bed
: Unidirectional TREs, maybe lncRNAs transcribed from enhancers (e-lncRNAs) as suggested here.
{SID}
will be replaced with the number of samples that peaks are called from,
if you only provide PINTS with one sample, then {SID}
will be replaced with 1,
if you try to use PINTS with three replicates (--bam-file A.bam B.bam C.bam
), then {SID}
for peaks identified from A.bam
will be replaced with 1.
For divergent or bidirectional TREs, there will be 6 columns in the outputs:
- Chromosome
- Start site: 0-based
- End site: 0-based
- Confidence about the peak pair. Can be:
Stringent(qval)
, which means the two peaks on both forward and reverse strands are significant based on their q-values;Stringent(pval)
, which means one peak is significant according to q-value while the other one is significant according to p-value;Relaxed
, which means only one peak is significant in the pair.- A combination of the three types above, because of overlap for nearby elements.
- If epigenomic annotation is enabled by
--epig-annotation <biosample>
, then peaks that are less significant (--relaxed-fdr-target
, default is 2*fdr_target
), but overlap with epigenomic annotations from PINTS web server, will be listed with the confidence level:Marginal
.
- Major TSSs on the forward strand, if there are multiple major TSSs, they will be separated by comma
,
- Major TSSs on the reverse strand, if there are multiple major TSSs, they will be separated by comma
,
For unidirectional TREs, there will be 9 columns in the output:
- Chromosome
- Start
- End
- Peak ID
- Q-value
- Strand
- Read counts
- Position of the summit TSS
- Height of the summit
For all three types of TREs, if a valid biosample name for --epig-annotation
is provided, then an additional column with epigenomic annotation for each TRE will show up in the final output.
- If you want to use BAM files as inputs:
--bam-file
: input bam file(s);--exp-type
: Type of experiment. If the experiment is not listed as a choice, or you know the position of RNA ends on the reads and you want to override the defaults, you can specify:R_5
(5' of the read for single-end lib),R_3
(3' of the read for single-end lib),R1_5
(5' of the read1 for paired-end lib),R1_3
(3' of the read1 for paired-end lib),R2_5
(5' of the read2 for paired-end lib),- or
R2_3
(3' of the read2 for paired-end lib)
--reverse-complement
: Set this switch if 1)exp-type
isRx_x
and 2) reads in this library represent the reverse complement of RNAs, like PROseq;--ct-bam
: Bam file for input/control (optional);
- If you want to use bigwig files as inputs:
--bw-pl
: Bigwig for signals on the forward strand;--bw-mn
: Bigwig for signals on the reverse strand;--ct-bw-pl
: Bigwig for input/control signals on the forward strand (optional);--ct-bw-mn
: Bigwig for input/control signals on the reverse strand (optional);
--save-to
: save peaks to this path (a folder), by default, current folder--file-prefix
: prefix to all outputs
--epig-annotation <biosample>
: Use this option together with the name of the biosample that the library was derived from, for example K562; then epigenomic annotations will be downloaded from the PINTS web server and used for annotating and augmenting TREs identified by PINTS (for hg38 only);--relaxed-fdr-target <relaxed fdr>
: In the presence of--epig-annotation
, peaks that do not pass the original FDR cutoff but pass this relaxed cutoff and have support from DNase-seq and H3K27ac ChIP-seq will also be included in final outputs. By default, 2*fdr;--mapq-threshold <min mapq>
: Minimum mapping quality, by default: 30 orNone
;--close-threshold <close distance>
: Distance threshold for two peaks (on opposite strands) to be merged, by default: 300;--fdr-target <fdr>
: FDR target for multiple testing, by default: 0.1;--chromosome-start-with <chromosome prefix>
: Only keep reads mapped to chromosomes with this prefix, if it's set toNone
, then all reads will be analyzed;--thread <n thread>
: Max number of threads the tool can create;--borrow-info-reps
: Borrow information from reps to refine calling of divergent elements;--output-diagnostic-plot
: Save diagnostic plots (independent filtering and pval dist) to local folder
More parameters can be seen by running pints_caller -h
.
pints_visualizer
: Generate bigwig files for the inputs.pints_counter
: Generate a count matrix for downstream usages (e.g. differential expression analysis).pints_boundary_extender
: Extend peaks from summits.pints_normalizer
: Normalize inputs.
Please submit an issue with any questions or if you experience any issues/bugs. If you use PINTS in your work, please cite: https://www.nature.com/articles/s41587-022-01211-7.