/Bionano_fixdups

A script for removing artificial duplications introduced by Bionano solve scaffolding pipeline

Primary LanguageRGNU General Public License v3.0GPL-3.0

Bionano_fixdups

Bionano_fixdups is a script for removing artificial duplications introduced by Bionano solve scaffolding pipeline. Starting from the identification of negative gaps annotated in the agp file, the script performs alignments between contigs at the 5' and at the 3' flanking regions of negative gaps in scaffolds, trims the overlaps, and produces a trimmed fasta file. The script is experimental, and its development was discontinued after the release of more refined tools as BiSCoT.

Prerequisites

  • Minimap2
  • Samtools
  • Jvarkit samextraclips
  • R with BioStrings package

Usage

Bionano_fixdups.R

Rscript ./Bionano_fixdups.R <scaffolds.fasta> <file.agp> <contigs.fasta>

Note: set the path to Minimap2, Samtools and Samextractlips executables inside the script before running it.

Inputs:

  • <scaffolds.fasta>: fasta file with scaffolds produced by Bionano hybrid scaffolding pipeline
  • <file.agp>: agp file describing which contig has been included in each scaffold
  • <contigs.fasta>: fasta file with contigs cut by Bionano hybrid scaffolding pipeline

Outputs:

  • <scaffolds_neg_gaps_fixed.fasta>: fasta file with overlaps between contigs trimmed
  • logfile_fix_scaffolds.txt: logfile reporting operations performed on input scaffolds
  • fix_scaffolding_temp: directory containing temporary files