threaded `fastafs cache`
Closed this issue · 0 comments
yhoogstrate commented
requires reading everything twice but avoids occupying all memory
i.e. indexing a FA file to find positions of all separate sequences, and index per sequence
a possible nice trick, which assumes padding is equal everywhere, is:
first: read until first sequence name is found
second: read until first newline is found
set that length+1 as chunk-size (let's say 60+1)
read next 61 chars and confirm is last char is indeed a newline and if the first not is coincidentely a '>' - the trick is to only read the first and or last byte, which does not require the entire file
if so, proceed with next line, etc