Parascopy is designed for robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing.
Created by Timofey Prodanov timofey.prodanov[at]gmail.com
and Vikas Bansal vibansal[at]health.ucsd.edu
at the University of California San Diego.
- Citing Parascopy
- Installation
- General usage
- Output files
- Precomputed data
- Known issues
- Issues
- See also
Currently, the paper is in progress, please check later.
You can install Parascopy using conda
:
conda config --add channels bioconda
conda config --add channels conda-forge
conda install -c bioconda parascopy
Alternatively, you can install it manually using the following commands:
git clone https://github.com/tprodanov/parascopy.git
cd parascopy
python3 setup.py install
To skip dependency installation, you can run
python3 setup.py develop --no-deps
Additionally, you can specify installation path using --prefix <path>
.
Some parascopy commands require installed
You do not need to install these tools if you installed parascopy through conda
.
Main focus of this tool is a homology table - a database of duplications in the genome.
To construct a homology table you would need to run:
parascopy pretable -f genome.fa -o pretable.bed.gz
parascopy table -i pretable.bed.gz -f genome.fa -o table.bed.gz
Note, that the reference genome should be indexed with both samtools faidx
and bwa index
.
Alternatively, you can download a precomputed homology table.
To find aggregate and paralog-specific copy number (agCN and psCN) across multiple samples, you should run
# Calculate background read depth.
parascopy depth -I input.list -g hg38 -f genome.fa -o depth
# Estimate agCN and psCN for multiple samples.
parascopy cn -I input.list -t table.bed.gz -f genome.fa -R regions.bed -d depth -o out1
# Estimate agCN and psCN using model parameters from a previous run.
parascopy depth -I input2.list -g hg38 -f genome.fa -o depth2
parascopy cn-using out1/model -I input2.list -t table.bed.gz -f genome.fa -d depth2 -o out2
See parascopy help
or parascopy <command> --help
for more information.
See output file format here.
For hg38 you can use the following precomputed data:
- Precomputed homology tables: hg19 (25 Mb) and hg38 (40 Mb).
- Precomputed model parameters for five superpopulations: hg38 (11 Mb).
If aggregate copy number jumps significantly in a short region (especially for disease-associated genes, such as SMN1),
it is possible that the alignment file is missing reads for some duplicated loci.
You can try to map unaligned reads, or map all reads using a different alignment.
To extract unaligned reads use samtools view input.bam "*"
(does not extract unmapped reads with a mapped mate).
Please submit issues here or send them to timofey.prodanov[at]gmail.com
.
Additionally, you may be interested in these tools: