Issues with substring processing in parameter expansion
tucak opened this issue · 10 comments
Consider the following snippet:
FOO="\\a"
echo ${FOO#*\\}
What I expect this to output is a
(and this is what dash
and bash
does), but smoosh
outputs \a
.
Another issue:
FOO="a?b"
echo ${FOO#*"?"}
This should output b
(as it does on dash
and bash
), but with smoosh
is outputs ?b
, because it does not consider the question mark as escaped.
It seems that when try_match_substring
receives the pattern, it is already fully expanded, with quotes removed, despite the code assuming that it still has quotes and escaping backslashes in it.
A similar pattern to the first example is used in the Modernish test suite to iterate over a string, this bug causes it to be stuck in an endless loop forever.
Hmm... with #19 we may be able to simplify this. Getting quoting correct was quite difficult (so it's a little disappointing to see it's still not quite right). Hopefully the fix is just in the matching logic and not in the expansions in the semantics proper.
Thanks for the report!
This might be related:
foo=\"
# expansion - fails
case "$foo" in
"$foo" ) echo OK ;;
* ) echo NOT OK ;;
esac
# literal - works
case "$foo" in
\" ) echo OK ;;
* ) echo NOT OK ;;
esac
It seems that the quoting in symbolic_string_of_expanded_words true
is inadequate. I'll see if #19 can be extended to handle this case too.
I found a fix for the problem with case
, it turned out to be unrelated.
I traced this issue back to shim.ml
, it seems that escaped backslashes in parse_string
are returned as plain backslashes, which in turn interfere with anything that follows it. For example, the output for \129\*
and \129*
will both result in \*
. However, making \129\
return \\
breaks everything horribly...
These issues are exactly the sort of thing that made debugging absolute hell here. :( I suspect that I ended up at some local maximum that passes the test cases I had but still wasn't 100% correct. Part of the problem is that I don't completely understand dash's approach to escaping.
I think the right solution is in libdash's ast.ml: an explicit AST node for escaped characters, but that would pretty significantly change the whole pipeline. Probably worth it in the long run, especially for compatibility with different parsers.
I have a possible fix: tucak@a5acf1e
It seems to solve the issue, although there could be a corner case somewhere that is not properly handled. What do you think?
It turns characters escaped by a backslash into their own control entities. During expansion, these are turned into double-quoted strings. For example, echo \$foo
will turn into the equivalent of echo "$"foo
. It solves the issue with echo ${FOO#*\\}
and does not break any of the tests in the smoosh and modernish test suites. I couldn't come up with any cases where it would not work, but I am not as familiar with the POSIX specification and have a hard time studying it, maybe you have any objections to it?
I understand that it is quite hackish, but, if my understanding is correct, a proper solution would involve redesigning a good part of the code to be less "stringly typed" and will probably wait for someone more educated than me to do it.
Hmm... I'm surprised that it could be so simple! I would have expected that there would be issues with pattern matching in cases. Does this pass all the tests? (GH only shows it for PRs, I guess...)
Yes, it does not break any of the existing tests. Of course I don't have access to the POSIX test suite so I can't comment on that. There could be issues with shtepper
's web interface, I did not try that.
Fixing up the Shtepper is easy enough, and I believe that the relevant tricky cases (particularly dealing with case statements) are in my own test suite. This seems like a good change to me!
I'm embarrassed to say that I still haven't been able to debug the POSIX test suite's Docker (weird issues with timestamps in the FS) and VM issues (spurious SIGTSTP, can't run the suite), though not for lack of trying. :(
Will switching to Morbig change the AST? If not, do you want me to open a proper pull request for the change linked in #21 (comment)?
It should not change the AST... but it's hard to completely predict things, of course. A student is working on it now, with the assumption that your fix will get merged in. A PR would be great... thank you!