mbhall88/rasusa

Input parameter for number of bases in addition to coverage and genome size

tomazberisa opened this issue · 6 comments

In addition to the ability of providing --coverage and --genome-size, an alternative usage mode where the user provides the total number of bases in downsampled out (e.g., --bases) would also be useful in certain use cases.

$ rasusa ...
...
[2021-08-19][16:23:02][rasusa][INFO] Target number of bases to subsample to is: <value>
...

To clarify, the idea is to provide <value> from example output above ^ directly via a command-line parameter instead of calculating it from coverage and genome size.

would also be useful in certain use cases.

Which uses cases would you find this useful for?

One example is a FASTQ file that contains sequencing reads from more than one species. In this case the coverage + genome size input isn’t directly applicable to the contents of the file.

I see. Fair enough. I can see the utility of such an option.

Amazing, thank you for #32 and v0.6.0!

Thanks for the enhancement suggestions!