jermp/fulgor

Generating `WARNING: No newline at ending of file 'nonewline.fasta'` during construction results in a broken/incorrect index

Opened this issue · 3 comments

I ran fulgor on broken input containing files that generate WARNING: No newline at ending of file 'nonewline.fasta' errors from ggcat and noticed that the index fulgor builds will be wrong after this.

For example a large .fur index had size 206G on disk when generated from broken inputs but when the inputs were fixed the size grew to 281G which is closer to what I expected. Queries on the first index worked but produced results with no matches in the broken inputs.

The different index sizes also replicate on artificial data containing a file that generates the warning.

So, just as a heads-up, it might be better to abort if the inputs have this error. I've also reported this to ggcat and suggested that the warning should be an error.

jermp commented

Hi Tommi and thank for the suggestion. I agree: if the input is broken, GGCAT should abort the construction (and Fulgor too, in turn). Right now, I don't think there is a way to fix this in Fulgor as the warning is just a printed message. Right?

Yeah it's just text printed from rust using eprintln (https://github.com/algbio/ggcat/blob/a91ecc97f286b737b37195c0a86f0e11ad6bfc3b/crates/io/src/lines_reader.rs#L155) so detecting would require either capturing the text from rust and parsing it, or checking the input files somewhere within the fulgor code. I don't think this is a very common error to run into, though, so probably OK to wait and see if ggcat changes this.

jermp commented

Ok, as I thought. I'll leave this issue open anyway as a reminder.

thanks!