threaded `fastafs cache`

Question

threaded `fastafs cache`

Closed this issue 5 years ago · 0 comments

requires reading everything twice but avoids occupying all memory

i.e. indexing a FA file to find positions of all separate sequences, and index per sequence
a possible nice trick, which assumes padding is equal everywhere, is:

first: read until first sequence name is found
second: read until first newline is found
set that length+1 as chunk-size (let's say 60+1)
read next 61 chars and confirm is last char is indeed a newline and if the first not is coincidentely a '>' - the trick is to only read the first and or last byte, which does not require the entire file
if so, proceed with next line, etc