eliaskosunen/scnlib

Scan format "{} = ({}, {})" with std::string not properly parsed

jhonatandarosa opened this issue ยท 2 comments

Hi ๐Ÿ‘‹

Today I was trying to parse an input with scn::scan and found something that could be a bug or limitation.

here's the code to reproduce:

#include <string>
#include <iostream>
#include <scn/scn.h>

int main(int argc, char* argv[]) {
    std::string input{"AAA = (BBB, CCC)"};
    std::string a,b,c;

    auto r = scn::scan(input, "{} = ({}, {})", a, b, c);
    std::cout << "error: " << (int)r.error().code() << " " << r.error().msg() << std::endl;
    std::cout << "a: " << a << "\nb: " << b << "\nc: " << c << std::endl;

    return 0;
}

and the output:

error: 3 Expected character from format string not found in the stream
a: AAA
b: BBB,
c: 

Is this the expected behavior?

Thanks in advance ๐Ÿ™‡

Yeah, that's expected behavior, even though it could be argued to be unexpected. That's because the format strings work the same way they work for scanf: a string is matched until a whitespace character, not according to what's the context around it in the format string.

char buf1[32] = {};
char buf2[32] = {};
int ret = std::sscanf("abc, def", "%s %s", buf1, buf2);
// ret == 2
// buf1 == "abc,"
// buf2 == "def"
//  -> %s matches until a whitespace character
ret = std::sscanf("abc, def", "%s, %s", buf1, buf2);
// ret == 1
// buf1 == "abc,"
// Because buf1 already matched the ',',
// it can't be matched from the format string
//  -> error
// buf2 not touched

https://godbolt.org/z/13hj77bxE

The same applies for scn::scan. In your example, the second {} matches the comma coming after it in the input, so it can't be matched from the format string.

The workaround is the same for both scanf and scn::scan: the acceptable characters to match must be specified in the format string: with %[...] with scanf, and with {:[...]} with scn::scan.

std::string input{"AAA = (BBB, CCC)"};
std::string a,b,c;

// In {:[...]}, the `...` specifies what characters are matched
// ^ inverts that selection
// So, {:[^,]} matches everything but a comma,
// and {:[^)]} matches everything but a closing parentheses
auto r = scn::scan(input, "{} = ({:[^,]}, {:[^)]})", a, b, c);
std::cout << "error: " << (int)r.error().code() << std::endl;
std::cout << "a: " << a << "\nb: " << b << "\nc: " << c << std::endl;

https://godbolt.org/z/dMKdqob9a

This design decision stems from the idea that the way an argument is scanned is only determined by the type and its format specifiers (i.e. everything inside the {}), and not by what's around it in the format string. This design decision could nevertheless be revisited at a later time.

For now, closing as WONTFIX -- working as intended.

Thank you for the detailed explanation ๐Ÿ™‡