Issue when get the size of sequence length in reference genome file
huangnengCSU opened this issue · 1 comments
Hi developer,
When I used seq_io::fasta::Reader to load reference genome (such as GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna
), the size of each chromosome sequence was not corrected (larger than the true sequence length). This is because in reference fasta file the sequence of each chromosome is divided into multiple lines. And I think the size of chromosome sequence in seq_io::fasta::Reader includes all LFs when calculate the sequence length.
Best,
Neng
Hi! Could you maybe post some example code how you determined the sequence length? This would help me reproducing it. Actually, if you follow this example, the length should be correct, since the individual sequence lines should not have any CR/LF in them. In contrast, Record::seq()
does contain all line endings and the length of that slice will be larger than the actual length.