/physlr

:chains: Construct a Physical Map from Linked Reads

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Published in DNA Release Conda Downloads

Physlr: Next-generation Physical Maps

Physlr physical-map constructs a de novo physical map using linked reads from 10X Genomics or MGI stLFR. This physical map can then be used for various genomics analyses, including scaffolding. Physlr scaffolds uses the physical map generated in the first stage to scaffold an existing genome assembly to yield chromosome-level contiguity.

Contents

Installation

You can install Physlr either via Conda or by compiling from source. We recommend installing Physlr via Conda package manager (Linux, MacOS), which will handle compilation and dependencies automatically.

Install Physlr using Conda

In an active conda environment:

conda install -c bioconda physlr
physlr help

Physlr can generate complmentary reports (included in the pipeline by default) - you can install dependencies for these optional features using conda:

conda install -c r r-rmarkdown
conda install -c r r-essentials
conda install -c conda-forge r-ggplot2

We recommend using pypy3 over regular python3 for speed. pypy v3 or pypy3 is the default python executable for Physlr. To switch to other executables set the python_executable argument:

physlr [OPTION]... python_executable=python3

You can install pypy3 using conda:

conda install -c conda-forge pypy3.8 # Change specified version based on your conda environment's python version (3.6 to 3.9 are supported)

Compile Physlr from source

Compiling

Compile Physlr using the following commands:

pip3 install --user git+https://github.com/bcgsc/physlr
git clone https://github.com/bcgsc/physlr
cd physlr/src && make install

or, to install Physlr in a specified directory (like /opt/physlr):

pip3 install --user git+https://github.com/bcgsc/physlr
git clone https://github.com/bcgsc/physlr
cd physlr/src && make install PREFIX=/opt/physlr

after compiling, Physlr commands will be available through:

bin/physlr-make
bin/physlr-make help

Dependencies

Optional dependencies

  • pigz for parallel gzip
  • zsh for reporting time and memory usage

Developers dependencies

There are additional functions in Physlr (especially the python version) for developers to generate more granular reports. The dependencies of these functions are listed below:

Running Physlr

Generate a physical map

To construct a physical map de novo, you need linked reads (from 10X Genomics or MGI stLFR).

In this example, the linked reads dataset is called linkedreads.fq.gz. The linked reads are from stLFR so we specify protocol=stlfr to use the default value for stLFR reads.

cd experiment # Change to working directory 
physlr physical-map lr=linkedreads protocol=stlfr                # Constructs the physical map

You also have the option to provide a reference genome (with ref) for Physlr to evaluate the physical map. Assuming the reference is called reference.fa, you can run the following command for the previous example:

cd experiment
physlr physical-map lr=linkedreads ref=reference protocol=stlfr  # Constructs the physical map and reference-based evaluations for it

If you provide a reference genome, Physlr first constructs a physical map and then maps it to the input reference. In this case, Physlr automatically outputs a *.map-quality.tsv file reporting assembly-like quality metrics for the physical map. In addition, Physlr visualizes the correctness and contiguity of the physical map.

You can also independently run the physical map construction and evaluation steps:

cd experiment
physlr physical-map lr=linkedreads protocol=stlfr
physlr map-quality lr=linkedreads ref=reference

Scaffold an assembly

To scaffold a draft assembly, you need linked reads from 10X Genomics or stLFR, and an existing assembly. In this example, the linked reads and draft assembly are called linkedreads.fq.gz and draft.fa, respectively. The linked reads are from 10X Genomics so we specify protocol=10x to use the default value for 10X Genomics reads.

cd experiment
bin/physlr-make scaffolds lr=linkedreads draft=draft protocol=10x

You can also include a reference genome ('reference.fa' in this example) for Physlr to calculate Quast summary metrics for the Physlr scaffolded assembly:

cd experiment
bin/physlr-make scaffolds lr=linkedreads ref=reference draft=draft protocol=10x

See the help page for further information. bin/physlr-make help

Output files

  • lr.physlr.physical-map.path: Paths of barcodes (backbones).
  • lr.physlr.physical-map.ref.n10.paf.gz.*.pdf: Various graphs showing the contiguity and correctness of the backbones with respect to the reference.
  • draft.physlr.fa: Physlr scaffolded assembly using the physical map.
  • draft.physlr.quast.tsv: Quast metrics comparing the Physlr scaffolded assembly against the reference.

Citation

If you use Physlr in your research, please cite:

Afshinfard A, Jackman SD, Wong J, Coombe L, Chu J, Nikolic V, Dilek G, Malkoç Y, Warren RL, Birol I. Physlr: Next-Generation Physical Maps. DNA. 2022 Jun 10;2(2):116-30. doi: https://doi.org/10.3390/dna2020009

link

Support

Create a new issue on GitHub.

GitHub issues

Acknowledgements

This projects uses:

  • btl_bloomfilter BTL C/C++ Common bloom filters for bioinformatics projects implemented by Justin Chu
  • nthash rolling hash implementation by Hamid Mohamadi
  • readfq Fast multi-line FASTA/Q reader API implemented by Heng Li
  • robin-map C++ implementation of a fast hash map and hash set using robin hood hashing by Thibaut G.