/guppy_bcsplit

Split output of Oxford Nanopore's Guppy basecaller output according to barcodes found in sequences

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

Python

Someone wrote this manual for my small bcsplit script. I justed merged into my readme. Thanks, stranger :-)

guppy_bcsplit

Installation

I saw someone on the Nanopore community unsure of how to use the program so these are some install steps that I used. Until guppy can demultiplex files natively (which should be soon), other programs will have to be used. If you've already run guppy_barcoder this script will work for you. Thanks to Michael Schmid for making the script.

biopython is required for guppy_bcsplit as it imports the Bio module from SeqIO in biopython to function. guppy_bcsplit works in python 2.7 and python 3.7.1 for me.

Clone the git repository:

git clone https://github.com/ms-gx/guppy_bcsplit.git

It's only one python file so you could easily put it anywhere.

Unless you change your PATH you'll have to cd and execute the script where it's located with ./guppy_bcsplit.py:

If you have multiple fastq files, you'll need to concatenate them first:

cat source_folder/*.fastq >> destination_folder/name.fastq

Commands

Unless you've added guppy_bcsplit to PATH you will need cd and run it from the folder with ./guppy_bcsplit.py

The following commands for guppy_bcsplit are:

-b is the folder containing the guppy barcoding summary file generated by guppy.

-f is the folder containing the .fastq files. If you wanted to point to a specific file, just add name .fastq after the folder

-p is the prefix that gets attached to barcodeXX so it will look like prefix_barcodexx.fastq.

-s is the folder where guppy_bcsplit will output a summary text of how many of each barcodes and unclassified reads there were.

./guppy_bcsplit.py -b folder_containing_guppy_barcode_summary/barcoding_summary.txt -f folder_containing_guppy_fastq_concatenated/ -p prefix -s folder/guppy_bcsplit_summary.txt

guppy_bcsplit should now demultiplex your barcodes and unclassified reads to separate .fastq files and create a summary output.