psf/black

Merge implicitly concatenated string literals that fit on one line

max-sixty opened this issue ยท 17 comments

Black could make single-line strings over multiple lines (i.e. a number of single quotes strings on multiple lines surrounded by parentheses) more efficient, by resizing them to the full length of the line.

Even if that was overreach, there's a peculiar situation where you end up with multiple strings on the same line, like below:

-        warnings.warn('Dataset.sel_points is deprecated: use Dataset.sel()'
-                      'instead.', DeprecationWarning, stacklevel=2)
-
+        warnings.warn(
+            'Dataset.sel_points is deprecated: use Dataset.sel()' 'instead.',
+            DeprecationWarning,
+            stacklevel=2,
+        )
ambv commented

In this case Black is suggesting that you should merge the two strings into one and the result is more readable that way.

I don't do this automatically (yet?) because it gets complicated if the two strings don't share the same prefix (for example r'something' f'another thing'). This is where user action after the formatting is probably best.

Right, that makes sense. I think whether the strings on the same line are merge-able is clear (i.e. do they have the same prefix), but yes it's a rare case; feel free to close

And changing beyond that may require discretion (e.g. turning 6 lines of 2/3-long lines into 4 full length lines)

ambv commented

There's another related problem: if I merged string literals, I am now making semantic changes to the AST. I'm not opposed to those but this will make safety checks after reformatting trickier.

Let's leave this open for the time being, it's an interesting problem.

zsol commented

I'm not even sure what I would expect black to do with code that implicit-concatenates two differently prefixed strings to be honest. I think the path of least surprise is just leaving them alone.

ambv commented

Yeah, it they are different prefixes, leave them alone. If they share a prefix and they end up on the same line, they should be merged.

If you really want to be correct here the implementation is going to be hard in the following edge case:

  • two strings like "STR1" "STR2" don't fit on one line because the closing quote of STR1, the space, and the opening quote of STR2 are the 3 characters that cause the entire thing to not fit in a single line. So you will keep them on two lines.
  • but if you knew that it's safe to concatenate them, it would fit in a single line (without those 3 extra characters).

I'm inclined not to touch this edge case since that makes it tricky where to perform the merge.

Another small edge case which I'm inclined to avoid is this:

a = (
    "a"
    "bb"
    "ccc"
    "dddd"
    "eeeee"
    "ffffff"
    "ggggggg"
    "hhhhhhhh"
    "iiiiiiiii"
    "jjjjjjjjjj"
    "kkkkkkkkkkk"
    "llllllllllll"
    "mmmmmmmmmmmmmmmm"
)

Technically Black could implement the "fill" algorithm for this case that Prettier also has for JSX. But I think what it currently does is fine for simplicity and obvious for users to recognize.

Another related case I've managed to hit with black is when it joins \-split string into one, and by doing so it violates line length limit. In this case (at least for now, while it's not implemented still), it'd be probably better if it did nothing? E.g.:

$ black -S -l30 --diff long_str.py

--- long_str.py	2018-07-15 12:24:14 +0000
+++ long_str.py	2018-07-15 12:24:41.221434 +0000
@@ -1,5 +1,2 @@
-s = '111111111111111111111' \
-    '222222222222222222222' \
-    '333333333333333333333' \
-    '444444444444444444444'
+s = '111111111111111111111' '222222222222222222222' '333333333333333333333' '444444444444444444444'

@aldanor line length violation should be a bug (I have also seen it with black==18.6b4). I think it deserves a separate issue.

@ambv Python 3.7 has AST level constant folding: https://bugs.python.org/issue29469 so implicit string concatenation - or lack thereof - would be invisible to the AST check

ambv commented

I know, @graingert, but we can't require Python 3.7+ for all Black users. Not yet at least. What I'm pondering is if we should rather switch to do the AST post-check using typed-ast which would make it work exactly the same on both Python 2 and Python 3. And the same on all Python 3 versions.

By the way, here's the pip installs for Black from PyPI for July 2018:

python_version percent download_count
3.6 86.07% 15,408
3.7 13.21% 2,364
3.5 0.41% 73
2.7 0.22% 39
3.4 0.08% 14
3.8 0.01% 2
3.3 0.01% 1
2.6 0.01% 1
Total 17,902

Source: pypinfo --start-date 2018-07-01 --end-date 2018-07-31 --percent --markdown black pyversion

ambv commented

Haha, one of those 3.8 downloads is me :-)

Having Black complete the concatenation would be great.

ofek commented

In the time being can we add a warning when this happens so we can manually resolve it?

I'd definitely like to see a warning for this, maybe something like "Implicit string concatenation in line N not merged." An issue I've run into is that someone writing multiple similar strings, in tests for example, will add continuations for all the strings so they look the same, even though some would fit on a single line. Then Black moves them back to a single line but leaves the continuation sitting in the middle. The user was trying to satisfy the formatting rules, but ended up producing less ideal formatting without knowing it.

What's your view on this example? Black left it unchanged.

def foo():
    return "Some long string cut in half," " this is really a long string"

def bar(text):
    return text

bar(("Some long string cut in half," " this is really a long string"))

Is there an open issue for the doing the opposite? I've found when black has left long lines in my code, it is usually overly long strings (mostly error messages, and when defining command line arguments).

Black could break long strings over multiple lines with implicit continuation (e.g. at spaces, or hyphens). I appreciate this would mean black having to set a convention for if the break point space should be trailing at the end of a truncated line, or leading at the start of a continued line.

Found it: See #182

I wrote a flake8 plugin to forbid these aberrant constructs: https://pypi.org/project/flake8_implicit_str_concat/