Implicit concatenated string formatting

Question

Implicit concatenated string formatting

Closed this issue 3 months ago · 18 comments

This is a proposal for changing our implicit concatenated string formatting to solve #8272

The problem in the mentioned issue is that ISC001 is incompatible with the formatter because the formatter might format an implicit concatenated string on a single line, which triggers ISC001

# Input
a = (
	"bbbbbbb"
	"ccccccc"
)

## Formatted
a = "bbbbbbb" "ccccccc"

Which triggers ISC001.

Today

We either format all parts of an implicit concatenated string or a single line or format each part on its own line, if necessary:

# If it fits

a = "bbbbbbb" "ccccccc"

# If it exceeds the line width
a = (
    "aaaaaaaaa"
    "bbbbbbbbbbbbbbbbbbbbb"
    "ccccccccccccccccccccccccccccccccccccccccccccccccccccccc"
)

Proposal

I propose to implement a subset of Black's improved string processing by automatically joining implicit concatenated strings into a single string if the joined string fits on a single line:

# If it fits

a = "bbbbbbbccccccc"  # Note that it's now a single string. 

# If it exceeds the line width
a = (
    "aaaaaaaaa"
    "bbbbbbbbbbbbbbbbbbbbb"
    "ccccccccccccccccccccccccccccccccccccccccccccccccccccccc"
)

The motivation behind it is that, to my knowledge, there's no real reason for using an implicit concatenated string if it fits on a single line. The primary use case for implicit concatenated strings is to make a string fit into the desired line length by splitting it over multiple lines.

Considerations

The main downside is that joining the strings is non-reversible. Reversability is a desired property (Ruff's formatting tends to be reversible except for the trailing comma insertion) because it ensures that if you get the same formatting after making changes to the code and then undoing them again (without using revert, but by manually undoing your changes).

For example, let's say you shorten a sentence in a string that makes the implicit concatenated string fit. Ruff would join it. But Ruff wouldn't split it if you extended the sentence by the same number of characters you previously removed, likely resulting in a string that now exceeds the configured line width.

That's probably why Black's improved string processing automatically joins and splits strings.

I'm undecided about whether we should accept the non-reversibility. Still, I think it solves an actual problem and is less involved than implementing Black's entire improved string processing formatting. Meaning it's an opportunity to provide value today. In other words. It's a first step towards automatically splitting strings and we accept non-reversability until we get there.

charliermarsh commented a year ago

Me too.

🎉4

Answer 1 · 2024-01-10T15:50:24.000Z

I am in favor of this proposal. I don't want implicit string concatenation ever, whether it's on a single line or multiple, so anything that helps avoid it is good by me.

Answer 2 · 2024-01-13T18:35:36.000Z

Thank you for this detailed proposal! I'm in In favor as well: I like the feature and I don't see why anyone would want this kind of string staying around in the code. This would satisfy my need reported in #8272.

I don't mind if this proposal becomes the default behavior or is behind a config option, but imo the argument of reversibility is not that much of a blocker, for two main reasons:

Version control (git, etc) diminish the need for reversibility, in the sense that big formatting changes are expected to modify the code quite a bit (especially when ran on a non-formatted code base).
As you pointed out Ruff's formatting tends to be reversible. I feel like it's already too late to have full reversibility (you mentioned the comma insertion example) and it add a strong constraint on the formatter behavior, possibly leading to a lot of work.

Answer 3 · 2024-01-18T13:05:32.000Z

I, too, would prefer making this as the default behavior of the Ruff formatter.
It's hard to imagine someone intentionally doing "foo" "bar", and if they were to, # fmt: skip can be used.

Answer 4 · 2024-02-13T13:13:52.000Z

Two challenges need careful handling:

Concatenating a raw string with a regular string: r"a\b\\n" "bcd". It's unclear how this should be handled because converting the regular string to a raw string isn't an option if it uses any escape sequences. Converting a raw string to a regular string could be an option but sounds scary, I remember how confused I was last time when I tried to understand how raw strings work.
Concatenating a single and a triple quoted string: "abcd" """single line but triple quoted string"""

We don't need to handle multiline strings because they always expand on multiple lines.

Answer 5 · 2024-02-13T13:21:31.000Z

Regarding your first point, what would happen if there's a newline in the second string (r"a\b\\n" "b\ncd")? We don't want to convert the latter into a raw string.

Python is converting it to normal strings by escaping the content in the raw string:

In [1]: r"a\b\\n" "bc\nd"
Out[1]: 'a\\b\\\\nbc\nd'

In [2]: r"a\b\\n" r"bc\nd"
Out[2]: 'a\\b\\\\nbc\\nd'

It's more prominent when you print it out:

In [3]: print(r"a\b\\n" "bc\nd")
a\b\\nbc
d

Answer 6 · 2024-02-13T13:22:53.000Z

yeah, I just realized that myself. There's also the issue that raw strings don't support escapes. So we might need to convert the other way round but I'm scarred doing this because I remember how confused I was last time when I tried to understand raw strings.

Answer 7 · 2024-02-13T15:33:19.000Z

I believe in you @MichaReiser 😃

Answer 8 · 2024-02-13T15:57:40.000Z

I remain sceptical ;)

Answer 9 · 2024-03-14T13:12:02.000Z

I view this proposal as a pragmatic step forward for Ruff. ISC001 is a helpful lint that catches real mistakes. Right now our team has to disable ISC001 since we can't have a large warning printed every time we format our code.

Answer 10 · 2024-06-17T20:44:18.000Z

I'd expect r strings to be relatively rare overall, but might make up a disproportionate number of desired implicit concatenations. For the sake of 80:20 here on value:effort, perhaps:

ISC001 could be adjusted to tolerate implicit concatenation with r and non-r strings[1] (as those may actually be desired)
ruff format could join implicit concatenations, except those mixing r and non-r strings (as those are non-trivial to rewrite automatically)

Those changes should remove the ISC001 conflict with ruff format, supporting the majority of otherwise affected users.

Only (the few?) users with r and non-r strings that also want to avoid implicit string concatenation would be affected with unreported warnings. If it's important to cover that last subset of users (at least until implementing r concatenation), perhaps a new ISC (or other namespace) rule could be introduced that does not tolerate r and non-r implicit concatenation, which could instead be added to the conflict list.

1: ISC001 already does not support fixing concatenations with r and non-r strings, perhaps for the same reasons it's challenging here(?):

$ cat x.py
print("a" "b")
print("a" r"b")
print("a\n" r"b\n")
print(r"a\n" r"b\n")
$ ruff check --extend-select="ISC001" --isolated x.py
x.py:1:7: ISC001 [*] Implicitly concatenated string literals on one line
x.py:2:7: ISC001 Implicitly concatenated string literals on one line
x.py:3:7: ISC001 Implicitly concatenated string literals on one line
x.py:4:7: ISC001 [*] Implicitly concatenated string literals on one line
Found 4 errors.
[*] 2 fixable with the `--fix` option.

Answer 11 · 2024-10-25T11:14:57.000Z

This is now available in preview mode, see #13663

Please give it a try and let me know if you run into any problems.

Answer 12 · 2024-12-16T12:06:35.000Z

FWIW: Ran into this issue with stable mode today and found this issue. With --preview, I get the overall better result, subjectively. Thanks!

That said, I'm unsure how to force something like

foo = (
    "a"
    "b"
    "c"
)

to remain multi-line. I use this in test code and a changes more rarely while the tests differ in b and c so I like the visual guidance -- even though in some cases, everything fits in one line. (Example: reitzig/espanso-nice-dev-refs:test/test_bitbucket.py)
I mention this because preview mode seems to be more aggressive in this regard than stable; probably because with this issue fixed, a few more cases fit into a single line than before.

Answer 13 · 2024-12-16T12:11:43.000Z

@reitzig if it's a one-off use case, then I suggest using a suppression comment

foo = (
    "a"
    "b"
    "c"
)  # fmt: skip

Answer 14 · 2024-12-16T18:05:31.000Z

@reitzig, without knowing more context about your codebase, that seems like a code smell. You should instead create a helper function with the shared values.

Answer 15 · 2024-12-17T16:49:46.000Z

@tylerlaprade One can always discuss code style, but in this case your proposal would only shift the issue to another rule: I would then want to force method arguments onto separate lines even though they'd (sometimes) fit into one.

Answer 16 · 2024-12-17T16:52:19.000Z

I suggest using a suppression comment

Works, thanks! Well, kinda. Now, no formatting happens within those parentheses. 🤔 Seems I'll have to choose some poison, don't I?

Answer 17 · 2024-12-17T18:34:26.000Z

One can always discuss code style, but in this case your proposal would only shift the issue to another rule: I would then want to force method arguments onto separate lines even though they'd (sometimes) fit into one.

@reitzig

The shared strings wouldn't need to be passed to the function, so the function inputs would only need to be the changing values. You therefore wouldn't need multi-line formatting.

Just curious, how long have you been working in codebases that enforce the use of an auto-formatter (any language)? I find that after some time, I no longer care about formatting and accept whatever the formatter's output is. It might sound jarring at first, but it's actually quite freeing to cross off an entire category of issue to think about.

By using Black, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, and freedom from pycodestyle nagging about formatting. You will save time and mental energy for more important matters.
Black makes code review faster by producing the smallest diffs possible. Blackened code looks the same regardless of the project you’re reading. Formatting becomes transparent after a while and you can focus on the content instead.

-https://black.readthedocs.io/en/stable/index.html

What usually happens once people are using Prettier is that they realize that they actually spend a lot of time and mental energy formatting their code. With Prettier editor integration, you can just press that magic key binding and poof, the code is formatted. This is an eye opening experience if anything else.

-https://prettier.io/docs/en/why-prettier