/fragmentstein

Create a BAM file out of insensitive fragments data (i.e. FinaleDB frag.tsv.bgz file) including sequences extracted from a reference genome.

Primary LanguageShellGNU General Public License v3.0GPL-3.0

Fragmentstein

version 2023.04 Creating a BAM files from non-sensitive fragments data (FinaleDB frag.tsv.bgz or fragment coordinate bed, bedpe files) using sequences extracted from a reference genome.

Contents

Make sure you have all the dependencies and you will be able to run the program.

Dependencies

  • samtools version 1.7 or higher;
  • bedtools version v2.30.0 or higher;
  • awk version 20200816 or higher;
  • gunzip (gzip) version 1.6 or higher;
  • Python version 3.10 or higher, only if you install it as a Python package;

Installation

For installing fragmentstein from the Python PyPi repository:

pip install fragmentstein

Optional: you can install it in a dedicated Python environment:

conda create -n fragmentstein python=3.10 samtools bedtools -c bioconda
conda activate fragmentstein
pip install fragmentstein

You can also use the Mamba package manager:

mamba create -n fragmentstein python=3.10 samtools bedtools -c bioconda
mamba activate fragmentstein
pip install fragmentstein

Afterwards, you can use fragmentstein directly from your shell:

fragmentstein -h

Alternatively, you can install it from source: Clone the repository. (This will not take care of the dependencies)

git clone https://github.com/uzh-dqbm-cmi/fragmentstein
cd fragmentstein

Add the path of the './scripts/fragmentstein.sh' into your PATH. In your ~/.bashrc or ~/.zshrc using the following command:

echo 'export PATH=$(pwd)/scripts/fragmentstein.sh:$PATH' >> ~/.bashrc

The fragmentstein.sh script should be available in your shell:

fragmentstein.sh -h

Test usage

The following examples will show you how to do a test run

mkdir results
fragmentstein.sh -i -i tests/data/test_sample1.tsv.bgz -o results/test_sample1.bam \
    -g tests/data/resources/test_ref_hg38.fna -c tests/data/resources/test_ref.chrom.sizes

You can install the Python wrapper also from source as follows: First install the Python dependency management and packaging tool called Poetry:

curl -sSL https://install.python-poetry.org | python3 -

Followed by installing the fragmentstein Python wrapper from the root of the cloned repository:

poetry install

To run tests use the following command:

poetry run pytest

Arguments

Required arguments

-i or --input Path to finaleDB frag.tsv.bgz file or .bed or .bedpe file. Expected are either a 6-column BED file or a 10-column paired-end BEDPE file.

-g or --genome Path to the reference genome fasta file.

-c or --chrom_sizes Chromosome sizes file. Optional arguments

-o or --output Path to and name of the output BAM file. Default is to substitute the .tsv.gz part of the extension with .bam.

-r or --read_length Both reverse and forward reads of a fragment will have this length unless the fragment is shorter than the read length. Default: 101.

-qf or --map_quality_filter Minimum mapping quality. Setting it to '0' accepts all fragments. Default: 30.

-qd or --map_quality_default Mapping quality to set for example if missing from the input files or if you want to change it for downstream analyses. Default: 60.

-bq or --base_quality ASCII of Phred-scaled base QUALity+33. Default: F (quality: 37).

-N or --replace_incomplete_nucleotides Replace all incompletely specified nucleotides with N.

-s or --sort Sort the output BAM file by coordinate. No value has to be specified, just type -s for sorting.

-t or --threads Number of parallel threads to be used when possible. Default: 1.

--temp Temporary folder where to store intermediate temporary files. Default: same folder as the output file.

Credits

Fragmentstein is developed and maintained by Zsolt Balázs and Todor Gitchev. To reference the tool, please cite our paper.