`hex_codes_in_unicode_sequences`: restrictions around f-strings are too strict
Opened this issue · 5 comments
I had this idea while reading #4522. This probably shouldn't block hex_codes_in_unicode_sequences
from being stabilized since it is an edge case, but would be nice to have.
playground link Currently, f"{'\xFF'}"
does not get formatted. It should be formattable to f"{'\xff'}"
.
Note that any changes here have to be careful of f-string debug statements. This formatting would not have any effect since \xXX
is resolved in the string literal before the formatting output happens
>>> f"{'\xFF'=}"
"'ÿ'='ÿ'"
>>> f"{'\xff'=}"
"'ÿ'='ÿ'"
but if there is ever a situation where the formatting is applied when the escape isn't resolved, then the behavior would change observably.
Thanks, good catch. I think this doesn't need to block the stabilization; we can tweak it next year. Though I'd accept a PR fixing it before the end of the year.
I have found why this happens, but not how to fix it.
The issue was introduced by #4401. The test used for if an f-string can be formatted like a normal string is if any of the f-string {}
s contain a \
, which is too broad of a test.
You can see this by the fact that this code f"\xFF{"\a"}"
also doesn't have the hex value formatted.
Note that this does also accidentally fix a possible bug, since if this f-string f"{r"\xFF"}"
was formatted as a normal string to f"{r"\xff"}"
it would change program behavior (but be caught by the inequivalent to source code sanity check).
Since fixing this naively both introduces a bug and, in my attempt, breaks a ton of tests, I'm not sure what to do/what approach to take. Thoughts @JelleZijlstra? Also cc @tusharsadhwani
This is also could be considered a byproduct of the actual f-string formatting code being commented out with # TODO: Uncomment Implementation to format f-string children
, but I couldn't find any context from #3822 why this is the case.
Thanks for looking into this!
This is also could be considered a byproduct of the actual f-string formatting code being commented out with # TODO: Uncomment Implementation to format f-string children, but I couldn't find any context from #3822 why this is the case.
#3822 was a big change by itself to add parsing support for new f-strings. I didn't want to make the change even more complicated by adding formatting changes, especially because (as I recall) the initial version made some questionable formatting choices. Ideally we should format code inside f-strings, and that might provide a principled way to fix this issue too. However, it's a bigger project.
Since fixing this naively both introduces a bug and, in my attempt, breaks a ton of tests, I'm not sure what to do/what approach to take.
Since this is a fairly obscure bug, the best approach might be to leave this simple bug around for now, and investigate broader changes to format f-strings in a principled way, like I discussed above.
It wasn't (and still hasn't afaik) been decided how we want to approach nested string formatting, so #4401 simply restored the pre-3.12 behaviour.
We could for example adopt Ruff's decisions on how to format them. i.e., continue this discussion: astral-sh/ruff#9785
Update from the corresponding Ruff issues I've opened: The current python behavior might be a bug in the parser, in which case the current formatting would be correct. python/cpython#124363