NTproc is a pipeline for Nanopore Transcriptome reads processing. The main function of NTproc is removal of reads that correspond to fragmented cDNAs. NTproc supposes that a read corresponds to a fragmented cDNA if the read doesn't have PCR adapter sequences on both ends.
In addition to removal of reads that belong to fragmented cDNAs, NTproc rotates reads such that their poly-A tails become on the right (3') end. Also, NTproc trims adapters and performs demultiplexing.
Simply download the files using the command
git clone https://github.com/shelkmike/NTproc
- Modified_porechop (https://github.com/shelkmike/Modified_porechop) should be installed and available through $PATH .
Ntproc has two mandatory options and one additional option.
The mandatory options are:
1) --fastq — path to a FASTQ file with unprocessed reads.
2) --adapter — a full or partial sequence of a PCR adapter used for cDNA amplification.
The additional option is
3) --output_folder — the folder to write results to. The default value is "NTproc_results".
An example of how to run NTproc:
bash ntproc.sh --fastq unprocessed_nanopore_reads.fastq --adapter AAGCAGTGGTATCAACGCAGAGT
In the output folder of NTproc, aside from some intermediate files and folders, you'll see the folder "Demultiplexed" with a content like this:
BC01.fastq
BC02.fastq
BC03.fastq
none.fastq
Files titled like BC01.fastq, BC02.fastq, BC03.fastq contain processed reads with Nanopore barcodes "Barcode 1", "Barcode 2", "Barcode 3", while the file none.fastq contains reads where Modified_porechop wasn't able to find a barcode. NTproc knows standard Nanopore barcodes from "Barcode 1" to "Barcode 96".
To check whether NTproc works correctly, you can use a test set of 10 000 reads (file 10000_reads.fastq), provided with NTproc. Run a test with a command like
bash ntproc.sh --fastq ./Test_set/10000_reads.fastq --adapter AAGCAGTGGTATCAACGCAGAGT --output_folder Test_results
NTproc utilizes a single CPU thread and is capable of processing 1 000 000 reads in approximately an hour. If a faster performance is required, a user can split the input FASTQ file into batches and run several instances of NTproc independently. If you would like NTproc to have integrated parallelization, notify me via Issues (https://github.com/shelkmike/NTProc/issues).