Does pyfastx split fasta files sequentially?

Question

Does pyfastx split fasta files sequentially?

Closed this issue 5 years ago · 7 comments

I want to split paired-end fastq files, if I run pyfastx split -n 30 on each will the pair order be preserved such that subset 1 of forward reads and subset 1 of reverse reads will be correctly paired?

Answer 1 · 2019-11-15T16:12:09.000Z

I tried running split and got the following error:

python: malloc.c:2401: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

Answer 2 · 2019-11-16T16:26:37.000Z

I have fixed the memory malloc issue in new version. pyfastx can not split FASTA file sequentially. But pyfastx sequentially split FASTQ file. If you use pyfastx split paired-end fastq file. You should insure the two paired files have the same read counts and reads are paired in the same line position.

Answer 3 · 2019-11-26T20:47:37.000Z

So, pyfastx output is deterministic? That is paired-end files with reads in the paired-order and the same number of reads will have the same subsets by position in each of the N files produced by split? And these will be deterministic but not sequential subsets?

Answer 4 · 2019-11-26T20:57:55.000Z

Just re-read your comment didn't notice the difference between FASTA and FASTQ. Why make an index if you are just splitting FASTQ sequentially?

Answer 5 · 2019-11-27T05:32:53.000Z

Thank you for your suggestion. It's really not necessary to build index prior to splitting FASTQ file. This will be fixed in later version.

Answer 6 · 2019-11-27T17:03:50.000Z

You could use the functionality to allow random sampling of the FASTQ file from the command line. I would find this very useful.

Answer 7 · 2019-11-29T01:08:11.000Z

I will consider your suggestion and implement a functionality for random sampling reads from FASTQ file.