`hex_codes_in_unicode_sequences`: restrictions around f-strings are too strict

Question

`hex_codes_in_unicode_sequences`: restrictions around f-strings are too strict

Opened this issue a month ago · 5 comments

I had this idea while reading #4522. This probably shouldn't block hex_codes_in_unicode_sequences from being stabilized since it is an edge case, but would be nice to have.

playground link Currently, f"{'\xFF'}" does not get formatted. It should be formattable to f"{'\xff'}".

Note that any changes here have to be careful of f-string debug statements. This formatting would not have any effect since \xXX is resolved in the string literal before the formatting output happens

>>> f"{'\xFF'=}"
"'ÿ'='ÿ'"
>>> f"{'\xff'=}"
"'ÿ'='ÿ'"

but if there is ever a situation where the formatting is applied when the escape isn't resolved, then the behavior would change observably.

Answer 1 · 2024-12-04T14:09:53.000Z

Thanks, good catch. I think this doesn't need to block the stabilization; we can tweak it next year. Though I'd accept a PR fixing it before the end of the year.

Answer 2 · 2024-12-05T02:50:15.000Z

I have found why this happens, but not how to fix it.

The issue was introduced by #4401. The test used for if an f-string can be formatted like a normal string is if any of the f-string {}s contain a \, which is too broad of a test.
You can see this by the fact that this code f"\xFF{"\a"}" also doesn't have the hex value formatted.

Note that this does also accidentally fix a possible bug, since if this f-string f"{r"\xFF"}" was formatted as a normal string to f"{r"\xff"}" it would change program behavior (but be caught by the inequivalent to source code sanity check).

Since fixing this naively both introduces a bug and, in my attempt, breaks a ton of tests, I'm not sure what to do/what approach to take. Thoughts @JelleZijlstra? Also cc @tusharsadhwani

This is also could be considered a byproduct of the actual f-string formatting code being commented out with # TODO: Uncomment Implementation to format f-string children, but I couldn't find any context from #3822 why this is the case.

Answer 3 · 2024-12-05T15:53:09.000Z

Thanks for looking into this!

This is also could be considered a byproduct of the actual f-string formatting code being commented out with # TODO: Uncomment Implementation to format f-string children, but I couldn't find any context from #3822 why this is the case.

#3822 was a big change by itself to add parsing support for new f-strings. I didn't want to make the change even more complicated by adding formatting changes, especially because (as I recall) the initial version made some questionable formatting choices. Ideally we should format code inside f-strings, and that might provide a principled way to fix this issue too. However, it's a bigger project.

Since fixing this naively both introduces a bug and, in my attempt, breaks a ton of tests, I'm not sure what to do/what approach to take.

Since this is a fairly obscure bug, the best approach might be to leave this simple bug around for now, and investigate broader changes to format f-strings in a principled way, like I discussed above.

Answer 4 · 2024-12-05T15:59:21.000Z

It wasn't (and still hasn't afaik) been decided how we want to approach nested string formatting, so #4401 simply restored the pre-3.12 behaviour.

We could for example adopt Ruff's decisions on how to format them. i.e., continue this discussion: astral-sh/ruff#9785

Answer 5 · 2024-12-12T18:48:26.000Z

Update from the corresponding Ruff issues I've opened: The current python behavior might be a bug in the parser, in which case the current formatting would be correct. python/cpython#124363