Scan format "{} = ({}, {})" with std::string not properly parsed
jhonatandarosa opened this issue ยท 2 comments
Hi ๐
Today I was trying to parse an input with scn::scan
and found something that could be a bug or limitation.
here's the code to reproduce:
#include <string>
#include <iostream>
#include <scn/scn.h>
int main(int argc, char* argv[]) {
std::string input{"AAA = (BBB, CCC)"};
std::string a,b,c;
auto r = scn::scan(input, "{} = ({}, {})", a, b, c);
std::cout << "error: " << (int)r.error().code() << " " << r.error().msg() << std::endl;
std::cout << "a: " << a << "\nb: " << b << "\nc: " << c << std::endl;
return 0;
}
and the output:
error: 3 Expected character from format string not found in the stream
a: AAA
b: BBB,
c:
Is this the expected behavior?
Thanks in advance ๐
Yeah, that's expected behavior, even though it could be argued to be unexpected. That's because the format strings work the same way they work for scanf
: a string is matched until a whitespace character, not according to what's the context around it in the format string.
char buf1[32] = {};
char buf2[32] = {};
int ret = std::sscanf("abc, def", "%s %s", buf1, buf2);
// ret == 2
// buf1 == "abc,"
// buf2 == "def"
// -> %s matches until a whitespace character
ret = std::sscanf("abc, def", "%s, %s", buf1, buf2);
// ret == 1
// buf1 == "abc,"
// Because buf1 already matched the ',',
// it can't be matched from the format string
// -> error
// buf2 not touched
https://godbolt.org/z/13hj77bxE
The same applies for scn::scan
. In your example, the second {}
matches the comma coming after it in the input, so it can't be matched from the format string.
The workaround is the same for both scanf
and scn::scan
: the acceptable characters to match must be specified in the format string: with %[...]
with scanf
, and with {:[...]}
with scn::scan
.
std::string input{"AAA = (BBB, CCC)"};
std::string a,b,c;
// In {:[...]}, the `...` specifies what characters are matched
// ^ inverts that selection
// So, {:[^,]} matches everything but a comma,
// and {:[^)]} matches everything but a closing parentheses
auto r = scn::scan(input, "{} = ({:[^,]}, {:[^)]})", a, b, c);
std::cout << "error: " << (int)r.error().code() << std::endl;
std::cout << "a: " << a << "\nb: " << b << "\nc: " << c << std::endl;
https://godbolt.org/z/dMKdqob9a
This design decision stems from the idea that the way an argument is scanned is only determined by the type and its format specifiers (i.e. everything inside the {}
), and not by what's around it in the format string. This design decision could nevertheless be revisited at a later time.
For now, closing as WONTFIX -- working as intended.
Thank you for the detailed explanation ๐