/Biocrutch

Various bioinformatics scripts

Primary LanguagePythonMIT LicenseMIT

Biocrutch

License: MIT

Various bioinformatics scripts.

Content:

  • SRAtoolkit. The program parses the link from the Sequence Read Archive (SRA) and allows you to download reads in the sra format. The program also checks the integrity of the finished reads by parsing the required metrics of the source files.
  • Pseudoautosomal region. A script for determining the coordinates of the pseudo-autosomal region on the sex chromosome. The output is a BED file with the coordinates of the pseudoautosomal region.
  • RepeatMasking scripts. Scripts for converting TRF, RepeatMasker and WindowMasker output to GFF format.
  • EMA to FASTQ. Combines Ema output BINfiles into reverse FASTQ, forward FASTQ and barcode-only files.
  • QuastCore. The program is an alternative to the publicly available Quast program. Its main differences are:
    1. adding only the necessary cutoffs.
    2. counting missing N values.
    3. the output of the program is a pandas dataframe used for further analysis.
    4. output to a convenient csv file format. And others.
  • Coverage statistics. Script for calculating median, average, maximum and minimum coverage. Script works with the output of Bedtools Genomecov and Mosdepth programs.
    1. calculate stats for whole genome.
    2. calculate stats for each scaffold.
    3. calculate stats stacking windows.
  • PSMC data combine. The script combines data from several PSMC outputs to draw multiple demographic population histories on a single graph.

and others...

To use the Biocrutch package, you need to add the package path to PYTHONPATH.

Copyright (c) 2020 Andrey Tomarovsky