Example does not work
marekkokot opened this issue · 3 comments
Hi,
this example does not work:
nthash::NtHash nth("TGACTGATCGAGTCGTACTAG", 1, 5); // 1 hash per 5-mer
while (nth.roll()) {
// use nth.hashes() for canonical hashes
// nth.get_forward_hash() for forward strand hashes
// nth.get_reverse_hash() for reverse strand hashes
}
more precisely this is not working:
nthash::NtHash nth("TGACTGATCGAGTCGTACTAG", 1, 5); // 1 hash per 5-mer
while (nth.roll()) {
std::cout << nth.hashes()[0] << '\n';
// use nth.hashes() for canonical hashes
// nth.get_forward_hash() for forward strand hashes
// nth.get_reverse_hash() for reverse strand hashes
}
I get just one hash.
I'm not using cmake build, I just get all the sources needed and use
g++ *.cpp btllib/status.cpp
When I change to set the length it works:
nthash::NtHash nth("TGACTGATCGAGTCGTACTAG", 21, 1, 5); // 1 hash per 5-mer
Also this works:
std::string s = std::string("TGACTGATCGAGTCGTACTAG");
nthash::NtHash nth(s, 1, 5); // 1 hash per 5-mer
but no this:
nthash::NtHash nth(std::string("TGACTGATCGAGTCGTACTAG"), 1, 5); // 1 hash per 5-mer
(the equivalent of the same as in the original example).
It took me a while to understand this.
Ctor captures const std::string& ref.
C++ guarantees that the temporal value (explicit like in the last code fragment, or implicit like in the original example) will live until the function end. The problem is the delegated ctor (const char*) does not copy the string content and we have a dangling pointer at the end.
Proposed solution:
maybe use memcpy like in the case of BlindNtHash
?
Although maybe there are reasons to not do this.
Anyway would be great to fix the example.
BTW. I had some issues cloning submodules, I have changed .gitmodules to
[submodule "vendor/argparse"]
path = vendor/argparse
url = https://github.com/p-ranav/argparse/
[submodule "vendor/btllib"]
path = vendor/btllib
url = https://github.com/bcgsc/btllib/
I'm not sure if it does make any difference.
BTW2.
Let's say I have a bit packet sequence, i.e. two bits per symbol in a form of uint64_t array.
Is there a way to use ntHash without unpacking it to std::string/cstring?
It's not crucial and may potentially be even worse vs efficient unpacking, but just out of curiosity.
Hi and thanks for your report and suggestions.
would be great to fix the example.
Agreed, at least until we find a better solution. I don't think using memcpy
in NtHash::NtHash
would be an efficient thing to do. It's ok for BlindNtHash
since the input string is supposed to be just the first k-mer, but NtHash
will get a whole sequence, and copying that would take extra time+memory.
BTW. I had some issues cloning submodules, I have changed .gitmodules
I used ssh to initialize the submodules, using HTTPS would be more reliable here. I'm going to remove all the submodules in the next release (soon) anyway to make ntHash easier to install/use.
BTW2.
Let's say I have a bit packet sequence, i.e. two bits per symbol in a form of uint64_t array.
Is there a way to use ntHash without unpacking it to std::string/cstring?
It's not crucial and may potentially be even worse vs efficient unpacking, but just out of curiosity.
This was coincidentally suggested to me in-person very recently. One would have to modify the ntf64
function to replace the SEED_TABLE
lookup with the desired uint64_t
value in the current implementation. I'll implement a better way to do this for the next release.
Hi,
thanks for the quick responses to all my questions!
I expected that you may want to avoid extra copying.
One suggestion that I think of is to replace const std::string&
in ctor with std::string_view
Best
Marek
this is still not working! It took me a while to realize that too :-)
It produces garbage values