Implication that padding zeroes would be copied seems to be wrong in multiple ways
jarikirma opened this issue · 3 comments
Code-used-on-Daniel-Lemire-s-blog/2022/12/29/protocol.cpp
Lines 231 to 240 in ef75e1d
I'm not really a C++ coder, but first of all: strlen
explicitly strips length beyond the first '\0'
, which prevents testing this code with non-zero bytes after one. Secondly, it would seem that even if strlen
is replaced with 8 (... which breaks the half of tests which actually relies on getting the length this way), those bytes are not copied - are they just by chance uninitialised zero generated by padding?
I found this out while trying out a relatively simple nul-terminated string to zero-padded uin64_t
with a fast path under assumption that only strings which cross page boundaries need byte-accurate reads and that padding can be performed by applying Hacker's Delight algorithm for finding the first zero byte.
Please refer to the C++ documentation on string_view
.
The line...
answer.emplace_back(std::string_view(picked.data(), strlen(picked.data())));
is not meant to copy any of the bytes. A string_view
is just like a pointer.
I could have just done...
std::vector<const char*> answer;
...
answer.emplace_back(picked.data());
and it would have been pretty much the same result.
strlen explicitly strips length beyond the first '\0',
The strlen function does not affect the input, it merely returns a number.
Per StackOverflow:
How do you construct a std::string with an embedded null?
So, I'd suggest the following:
using namespace std::string_literals;
const static std::string options[] = {
"https\0\0\0"s, "http\0\0\0\0"s, "ftp\0\0\0\0\0"s, "file\0\0\0\0"s,
"ws\0\0\0\0\0\0"s, "wss\0\0\0\0\0"s, "garbage\0"s, "fake\0\0\0\0"s,
"httpr\0\0\0"s, "filer\0\0\0"s};
This is a std:string
specific issue...
You are correct, I will change my code.