UPHL-BioNGS/Cecret

Extra lines in summary file with large number of samples

Closed this issue · 5 comments

Hello again!

I'm encountering an issue where extra lines are added to the cecret_results.csv file when a large number (hundreds) of samples are run together.

Basically, I get two extra lines in my cecret_results.csv that look like:

name,name,,,p/f,,,,,,,,model,alerts,,,,,,,,,,,,,,,,,,,,,,v3.14.240610,artic 1.2.4
seq,seq,,,best,,,,,,,,seq,,,,,,,,,,,,,,,,,,,,,,,v3.14.240610,artic 1.2.4

This only seems to happen when I run large batches of samples through together. I've tracked these extra lines back to the vadr.sqa file used during the cecret summary process (see: ncbi/vadr#81)

I think one possible fix would be to edit the line below replacing grep -v "#-" with grep -v "^#"" (as Eric suggests in the vadr issue).

if [ -s "vadr.vadr.sqa" ] ; then tail -n +2 "vadr.vadr.sqa" | grep -v "#-" | tr -s '[:blank:]' ',' > vadr.csv ; fi

Let me know if you need any other info or want me to try something else on my setup, and thanks again for all your work on Cecret.

Oh SNAP!

FYI, I'm working on this today

I actually need the vadr header, but the latest set of updates that are currently going through testing should fix the issue you are seeing.

Thank you for bringing this to my attention!

This issue should be fixed now!!!

Works perfect, thanks for the update!