null output in hist and dump due to missing seek operation
Closed this issue · 5 comments
This might be something that can be updated in meryl-utility
, so tagging @brianwalenz just in case.
The first time running merfin -hist
, the output has a bunch of nulls/nans. The number of contigs is correct, but they appear to be uninitialised. Running the exact same command again then works correctly.
This doesn't appear to happen for -vmer
, so after some digging, I noticed that the vmer critical block has
sfile->findSequence(seqId);
sfile->loadSequence(seq);
while for hist and dump it is only the loading.
It looks like findSequence
from mery-utility is just a seek operation. I don't know too much about seeks, but presumably after generating the index, the loadSequence is just reading from the end of the index and hence the nulls.
After adding in the findSequence
commands before the loading in hist and dump made this problem go away.
Generating fasta index.
Number of contigs: 712
0 (null) 0 0 0 nan
1 (null) 0 0 0 nan
2 (null) 0 0 0 nan
3 (null) 0 0 0 nan
Hi @ASLeonard,
Merfin is still under active development..thanks for your help in debugging :-)
That said, yesterday I reverted the master to the Jan 31 version, which in my hands was stable. Some changes that were introduced later (I suspect also some commits to the kmer counter meryl) broke it, hence I reversed it (and I believe I also was having those kind of problems). Can you confirm that you are using the latest version?
I hadn't updated to the latest code before noticing this.
I would guess the current "old" code has this part, which is calling for a new dnaSeqFile (and internally seems to be calling the compressed file reopen()
in its constructor).
sfile->generateIndex();
uint64 ctgn = sfile->numberOfSequences();
sfile = new dnaSeqFile(seqName);
I couldn't explicitly confirm in meryl-utility
, but presumably your reverted sfile = new dnaSeqFile(seqName);
has a similar effect as sfile->findSequence(seqId);
inside the loop.
this is @brianwalenz's magic :-) I can only confirm that the old version worked (after a long trial and error process by myself :-) ) and the new one didn't not. Not sure if it is also due to some changes in meryl-utility
It seems to be working fine now after that patch, and the unparallelised version would be very slow to be doing so many iterations, so I probably won't revert for now.
This can probably be closed as there is a stable revert now and Brian will probably still see this after being tagged.
Cool, thanks I'll close the issue!