psf/black

Single quotes option

bofm opened this issue ยท 70 comments

bofm commented

Hi! Although Black now prefers doubles, can we have an option to keep single quotes? Forcing double quotes would make this great project unusable for many users who picked the rule of using single quotes.

Operating system: MacOS
Python version: 3.6
Black version: 18.4a0
Does also happen on master: yes

I'm not a maintainer just an interested user, I respectfully voted down because I think black's power is to annihilate all these small discussions forever. Out of curiosity are there objective reasons that you need to stick with single-quotes?

bofm commented

are there objective reasons that you need to stick with single-quotes?

It is an approved code style I have to use at work. There are millions of lines of code with single quotes in the existing code base which are not going to be reformatted. I believe this is also the case for other people.

ambv commented

Please discuss the approved code style at work, Black's documentation is using several arguments as to why double quotes are preferred.

If you have strong reasons to prefer single quotes, list them here. "Approved code style" is not a reason alone.

bofm commented

The reasons to add single quotes options:

  • Black is unusable when contributing to an existing project written with single quotes
  • Python official documentation uses single quotes in code examples
  • default repr() produces single-quoted string literals for most standard types (dict, list, tuple, str, bytes, namedtuple, etc)
  • many popular projects and their docs use single quotes
  • PEP proposals mostly use single quotes in code examples
  • Guido used single quotes in his latest commits on Github

Let me emphasize - the point is that restricting the quotes variants to only double quotes makes this cool project unusable for a big part of Python programmers.

ambv commented

All your reasons can be summarized like this: "Black shouldn't have an opinion because other projects don't have an opinion."

This is an appeal to authority and as such doesn't give us anything actionable to discuss.

I also don't buy the argument that this makes it "unusable for a big part of Python programmers". Why? The code is going to still work, programmers may even continue writing their code with single quotes (and let Black convert it as needed).

Note that Black is using a variant of wrapping lines with brackets that isn't even covered as an option in PEP 8. Does this make Black unusable?

bofm commented

What if a project has an opinion but a different one? Is Blackโ€™s intention to change opinions or to be a helpful tool?

That is a false equivalence @bofm, projects can obviously be opinionated and be helpful. But it may be black isn't for you, no big harm in that right?

For what its worth I don't agree with all black does, but honestly, formatting is such a small pain that I'm 100% ready to outsource the entire concept and accept whatever the community comes up with. It beats hours of bikeshedding. StandardJS has a lot of the same forces acting on it in the Javascript world and they've also had to stick to their guns against many who disagree. I use both, and honestly, it's like an hour and its formatting becomes second nature. My pairing-partner and I spent last week with black (I've just activated it as part of CI), and it's fine. Everything is fine. The maintainers of this project listen to reason but not opinions when deciding a direction, what better principles can we really ask for?

As a matter of practicality, if you do want to use black, have you considered not running it against the entire codebase but limit it to specific folders? With a tiered rollout, you can slowly subsume folders instead of having to orchestrate a big all-or-nothing effort.

Option for quotes looks like a simple thing to add to black, if a lot of people argue about that why don't? Why we can have width size option but can't have quote option? Even if you look at the pr it was originally with single quotes by default with argument - it's simpler to type and then switched to double quotes because ' char can be added and quotes will be swapped to " (though now if " char added to the string the opposite will happen). Tab width, line width and quotes where the first options that were added to prettier https://prettier.io/docs/en/options.html (though authors claim that it's opinionated code formatter with minimal set of options as black do).

@bofm overall even if black won't accept option and it crucial for your project you can fork black and apply code from #75 when it was using single quote, it shouldn't be hard to maintain at least at this point.

ambv commented

Black is both a formatter and a code style. One of the core tenets of this project is that all blackened code looks the same. This is why I will never add configurable soft tab width, or identation with hard tabs.

Having options is a burden to the user and a slippery slope to introduce more and more. Case in point: you're already using --line-length as an argument to add something else. The more options, the bigger the need for a configuration file.

We chose to standardize on double quotes because:

  • standardizing on one makes sense; and
  • standardizing on double quotes makes more sense than single quotes.

I mostly wanted the former and didn't care much which quote is going to be the one we choose. Over the course of designing this with @carljm and @zsol, they convinced me that double quotes are a better option. So this is what Black does now. Shouldn't it be reassuring to users to know that what Black does is informed by rational arguments?

I didn't hear any arguments here that would tell me why single quotes are preferable. "This is what we used to do" doesn't work, Black is bound to be disruptive when adopted. "It's easier to type" is something that I myself said and this was debunked (it's not easier on all keyboard layouts, and more importantly you can keep typing whatever you want and Black will convert it for you).

I'll keep this open because maybe somebody will appear and communicate clearly why an option for this is required. Unless it's strictly necessary, Black won't introduce it.

PS. While forking is always an option in a MIT-licensed project, I felt like suggesting this on my issue tracker was hostile. I would appreciate you not doing that again.

PS. While forking is always an option in a MIT-licensed project, I felt like suggesting this on my issue tracker was hostile. I would appreciate you not doing that again.

Sorry, your project is great, I appreciate your work and didn't want to harm anyone with my comments.

ambv commented

Will close this for now. If there's any new evidence of single quotes being preferable over double quotes, we can reopen this and resume discussion.

I use single-quoted strings to indicate internal identifiers (eg. dict keys) and double-quoted strings to indicate human-readable text (exception messages and IO).

Roughly if the string matches ^\w+$ I would not enforce double quotes.

I think this is a really cool project, I hate double quotes though, and more than just typing double quotes, I don't like reading double quotes, quotes are just noise, and I actually really like javascript where you don't have to use quotes at all for dict keys e.g.

const dict = {
    fruits: ['apple', 'orange'],
    veggies: ['carrot', 'turnip'],
}

so single quote is the closest I can get in python to minimizing that noise. When your editor does syntax highlighting, you can easily see what is a string and what isn't.

I read the entire discussion and know you are highly unlikely to change your mind on this, but I want to still voice my opinion.

1. Another argument

Coupled with @bofl arguments.

Syntax highlighting is corrupted for some text editor syntaxes.

2. Example

I use PythonImproved syntax for Sublime Text.

Part of my SashaPythonImproved.py file:

SITEMAP = {
    'format': 'xml',
    'priorities': {
        'articles': 1,
        'indexes': 0.5,
        'pages': 0.5
    },
    'changefreqs': {
        'articles': 'always',
        'indexes': 'weekly',
        'pages': 'always'
    }
}

Expected

Black format this code to:

SITEMAP = {
    "format": "xml",
    "priorities": {"articles": 1, "indexes": 0.5, "pages": 0.5},
    "changefreqs": {"articles": "always", "indexes": "weekly", "pages": "always"},
}

Actual

Thanks.

this makes me sad :(

Could we please have more discussion about this? This the single pain point that prevents us of adopting black at work :(

I'd rather have all the quotes be unified than have them be single quotes. I think this is a reasonable default.

@roganov this is not a reason to not use Black.

@kennethreitz For new projects, yes. For existing ones that use single quotes, converting all quotes to double quotes is sometimes is not an option.

@zsol, why are double quotes better than single quotes?

ambv commented

@alanhamlett, is the explanation in the README insufficient?

Yes, it's insufficient. You should follow Prettier's example and allow configuring this tiny little quote thing instead of blocking massive amounts of teams from using it.

I'd rather have all the quotes be unified than have them be single quotes.

I agree, but only within the scope of a project or organization.

Overall, this conversation is bikeshedding and @ambv should just allow configuring quote style.

Hi @ambv! I totally love black. Thank you so much for this! I, personally, would love it more if using single quotes was configurable (just like line length).

I had wanted to voice my opinion here before as well but didn't really know how to articulate it and provide a reasoning for my choice. But today someone on HN explained it the way I think about using single quotes:

A double-quote is more noisy than a single-quote, and W is more noisy than V.
The difference is that " and ' are equally usable options in the context we're talking about. Quotes are very common, so the visual noise adds up when your screen is full of quote marks. Given that they mean the same thing, and one is both harder to type and harder to read, it makes sense to prefer the other.

Link to the HN comment: https://news.ycombinator.com/item?id=17158110

Double quotes are needlessly noisy. That's easy enough to see when you take a look at two ways to write an empty string: '' vs "".

Plus the python interpreter seems to prefer single quotes:

>>> "string"
'string'
ambv commented

To everybody that respectfully voiced their opinion here and voted for some of the existing comments, thank you.

@alanhamlett, please revisit your language choices. Even if you were right in your opinion, the way you voice it makes it hard for others to treat you seriously.

Calling a conversation childish because you disagree with the other side of the argument is unnecessarily escalating the situation. Accusations of "blocking massive amounts of teams" is hyperbole and unfair. Telling unpaid maintainers of an open source project what they "should" do is arrogant. You have no skin in this game and are in no position to demand any action.

With this out of the way, let me summarize my position.

Black as a code style is an attempt at a strict subset of PEP 8 which doesn't leave much up to debate. A "code style: black" badge on a repo should be enough to tell the reader exactly what they can expect from the code inside. The first configurable styling option would be the foot in the door that would be used to demand more configurability. This is why Black will never have an option to choose single quotes over double quotes.

What I'm still pondering is if Black should have an option to omit enforcement of styling of string quotes. That would allow projects to adopt it a bit more incrementally. I'm open to arguments in support of this idea. I mostly worry about getting more requests for options to disable other parts of Black in the future.

Iโ€™m confused by the โ€œdouble quotes means I canโ€™t use itโ€ statements, if there are concrete arguments (not opinions) or specific use-cases that exemplify this problem I think thatโ€™s valuable to bring up in this thread. But, honestly and with respect, Iโ€™m not at all persuaded by opinion pieces. โ€œBecause itโ€™s our standardโ€, โ€œI hate double quotesโ€, โ€œdouble quotes hurts many teamsโ€, these arenโ€™t arguments. We must be rigorous in our thinking!

My experience with black is that we rolled it out piecemeal, targeting certain folders in one project. After some experimenting the entire project ended up included. And it was fine. I donโ€™t agree to all blackโ€™s decisions, none of us at the office did, but we DID agree itโ€™s so much easier to not have bikesheddy formatting discussions.

I respect ambvโ€™s commitment to his original vision of a no-nonsense linter. The javascript community has StandardJS (which also draws ire over its quote choice), Iโ€™m above all else happy to see a strong voice emerging for Python and will support that over specific quote style. Thank you for your work, whatever decision you feel is best going forward๐Ÿ‘Œ

I would love to replace my use of yapf with black. Python is missing a standard code formatter.

I'd like to add a specific argument in favor permitting configuration here (see also @bofm's list above and @kennethreitz 's widely-adopted PEP8 amendment): typing a single quote takes fewer keystrokes when using the standard keyboard layouts in the US, UK, and China. It takes the same number of keystrokes on standard layouts in Germany and France. I know of no keyboard layouts which make it easier to type double quotes.

This is an excellent project. I look forward to seeing it develop.

zsol commented

@ariddell that's definitely a reasonable concern that has come up during development. We realized that we can just keep typing single quotes and let black correct it automatically. Would you agree this is a reasonable workflow?

@ariddell
The argument is, type in single quotes, and let black auto-format to double quotes, so it doesn't matter that its harder to type doubles.

I don't think any argument can be made to sway the maintainer. I was very excited about black, and looked forward to "prettier in python", but this is a deal breaker. I don't want double quotes in my code and I will never user black or advocate for it without a single quotes option. My argument to the maintainer is this, if you want the whole community to use this product and be happy, allow this option.

If you don't, I guarantee someone will just fork this and add in single quotes and just merge changes from this upstream repo. And to clarify this wouldn't be a hostile fork or takeover or something, it would just be an alternate repo for the people who will never agree with you on this.

If I run Black on a code base I work on, 95% of the changes it makes are just changing single quotes to double quotes. Unfortunately, that amount of noise makes it hard to see what meaningful changes it is making and whether they are on the whole positive or negative.

I'm not sure about the arguments for using double quotes either. There are always going to be times when you need to use the other type of quote in a string, and examples involving double quotes include working with JSON or HTTP headers. I'm not sure it's a bad thing to use double quotes for docstrings and single quotes for everything else either โ€“ some visual distinction between the two may well be perceived as a good thing.

Reasonable arguments in this thread, by appearance:

  1. Single quotes are easier to type.
  2. Single quotes make less noise.

Thats it. Two. All other statements are just an opinion or lack a logical reasoning.

  1. already pointed out here is not true for everybody.
  2. is the only thing i can see to be true for everybody because it is always better to have less clutter on your screen.

@ambv: "standardizing on one makes sense, standardizing on double quotes makes more sense than single quotes" and all the text around that also do not contain a single reason why they make more sense, exactly what you criticize when others post statements without a valid point of argument. Only saying something makes more sense is not a proper argument.
Standardizing can make sense to not have two different characters for the same thing, but using ' and " in the same code could also have positive effects. you can distinct keywords from actual strings for example. This could be a reason to use both.

In the readme file it says:
"If you're paid by the line of code you write, you can pass --line-length with a lower number. "
This is totally unreasonable and strictly against your own rules. (You even linked this yourself in this thread.)
You dont want to add an option for a quote setting which i totally understand and it makes sense for black to not have that, because black is not making any compromises. But why is there a line width option then? Can you name a reason to have an option for that setting that can not be said for the quotes? Otherwise it would just be inconsequent and one could argue: "I get less paid for every ', so we need an option to have single over double quotes.

Please also do not forget that it is not the number of arguments that counts but their total weight.

Thank you for your work, this is a great tool and I dont want to sound disrespectful in any way, Im just good at nipicking :)

Hi. I'm one of the opinionated people who commented on the Hacker News post. I came over to see what might be happening here, and discovered there has already been quite a bit of debate on this topic.

In what I hope will be a helpful gesture (and because I apparently have nothing better to do on a Saturday night :) ), let me try to gather all the arguments I can find in one place. This is the one issue that is open, so I hope this is the right place.

I should acknowledge that I have a personal preference. Therefore, I will do my best to step back from that preference, and first summarize all the arguments while trying to be as neutral as possible. I will confine my own evaluation of these arguments to a separate comment.

Here we go!

All the arguments, organized by source

README.md

  • Using double quotes avoids having to escape ASCII apostrophes.
  • PEP 257 says to use triple double quotes for docstrings.
  • In some fonts, an empty string in single quotes is hard to distinguish from one double quote.
  • C strings are written with double quotes.
  • Single quotes are easier to type on many keyboards.

#51

  • Single quotes are easier to type on many keyboards.
  • Docstrings generally use triple double quotes.
  • In some fonts, an empty string in single quotes is hard to distinguish from one double quote.
  • C strings are written with double quotes.
  • repr() defaults to single quotes.
  • Many popular open source Python projects use single quotes.

#75

  • When you have a series of English strings, some of which contain ASCII apostrophes, it is more consistent to format them all with double quotes than to format some with single quotes and some with double quotes.
  • If an English string is formatted with double quotes, then you can insert an ASCII apostrophe without having to escape it or change the quotes.
  • PEP 257 says to use triple double quotes for docstrings.

#118

  • Some projects and workplaces require single quotes as their style.
  • Python official documentation uses single quotes in code examples.
  • The interpreter and repr() use a specific quoting style (single quotes except for strings containing single quotes).
  • PEP proposals mostly use single quotes in code examples.
  • Guido tends to use double quotes for human-facing strings, single quotes otherwise.
  • Some programmers use single and double quotes to express two kinds of strings.
  • Single quotes create less visual noise for readers.
  • Using double quotes breaks syntax highlighting in Sublime Text.
  • Kenneth Reitz's PEP 8 amendment recommends the repr() way (single quotes except for strings containing single quotes).

Hacker News

  • Python official documentation prefers single quotes.
  • The standard library prefers single quotes.
  • The repr() way of formatting strings is what people see in the interpreter.
  • Programs that generate Python code tend to use repr() and produce output formatted that way.
  • The repr() way was devised a long time ago and is the least inventive solution.
  • In sh/bash, single quotes are for string literals without variable expansion or other substitution.
  • Some programmers use double quotes for human-facing strings, single quotes otherwise.
  • Double quotes create more visual noise.
  • Double quotes require double keypresses on the most common keyboard layouts.
  • Double quotes are used to quote strings in English prose.
  • Double quotes are easier for strings that contain ASCII apostrophes.
  • Single quotes are easier for strings that contain double quotes.
  • Other popular programming languages write string literals always with double quotes.

I think that's all of them. Let me know if I missed any and I'll be glad to update this!

Arguments organized into categories

Now, a more organized list, with duplicates merged. A few of these involve quantifiable measures, so I've collected and added the data here.

A. The Python language

  1. The Python official documentation mostly uses single quotes. [Python 3.6: 80% to 20% of occurrences]
  2. The Python standard library mostly uses single quotes. [Python 3.6: 74% to 26% of occurrences]
  3. The repr() way of formatting strings is what people see in the interpreter.
  4. The repr() way was devised a long time ago and is the least inventive solution.
  5. PEP 257 says to use triple double quotes for docstrings.

B. Common practice

  1. Many popular open source Python projects use single quotes. [not quantified]
  2. Some workplaces require single quotes as their style. [not quantified]
  3. PEP proposals mostly use single quotes in code examples. [not quantified]
  4. Guido tends to use double quotes for human-facing strings, single quotes otherwise.
  5. Docstrings generally use triple double quotes.
  6. Kenneth Reitz's PEP 8 amendment recommends the repr() way.
  7. Some programmers use double quotes for human-facing strings, single quotes otherwise.

C. Tooling

  1. Using double quotes breaks syntax highlighting in Sublime Text.
  2. Programs that generate Python code tend to use repr() and produce output formatted that way.

D. Readability

  1. In some fonts, an empty string in single quotes is hard to distinguish from one double quote.
  2. Single quotes create less visual noise for readers.
  3. Using double quotes avoids having to escape ASCII apostrophes.
  4. Using single quotes avoids having to escape double quotes.
  5. Given a series of English strings, some of which contain ASCII apostrophes, it is more consistent to format them all with double quotes than to format only the ones containing apostrophes with double quotes and the rest with single quotes.

E. Writability

  1. Single quotes are easier to type on common keyboard layouts. [This is true for US and UK standard keyboards and some European keyboards. For all the rest of the Latin alphabet keyboard layouts on Wikipedia, both types of quotes require the same number of keypresses; for no layouts are double quotes easier than single quotes.]
  2. If an English string is formatted with double quotes, then you can insert an ASCII apostrophe without having to escape it or change the quotes.

F. Other languages

  1. C strings are written with double quotes.
  2. In sh/bash, single quotes are for string literals without variable expansion or other substitution.
  3. Double quotes are used to quote strings in English prose.
  4. Other popular programming languages write string literals always with double quotes. [C#, Java, ...?]

FWIW the preferred character for an apostrophe (โ€™) works fine in both types of quotes. (Yes, you may not bother to use it, but itโ€™s not correct to say apostrophes can only be used in double quotes either.)

@reupen Ok, updated to clarify.

ambv commented

could you imagine this many people coming in droves to vote and comment and ask for a change?

This is invalid logic. Most people are happy with the current behavior or indifferent to it. They don't open issues or find existing ones to comment about how they're content with the current situation.

ambv commented

@zestyping, hi, I saw your comments on HN. Thanks for your work on Python in the past! I'm bundling lib2to3 in Black so in fact I ship some code you wrote :-)

Python official documentation prefers single quotes.

The standard library prefers single quotes.

This is not true. There is more of single quotes but there is no recommendation of one over the other, let alone enforcement. I am pretty sure that most authors don't care either way so counting percentages provides less information than you'd like.

ambv commented

Guido tends to use single quotes. [not quantified]

I spoke to him about this so I know this is not true either. He says he prefers double quotes for human-readable text and single quotes for data.

ambv commented

Using double quotes breaks syntax highlighting in Sublime Text.

This makes me sad. This should not be used as an argument in favor of single quotes. It's a bug to be fixed instead.

Using double quotes breaks syntax highlighting in Sublime Text.

This makes me sad. This should not be used as an argument in favor of single quotes. It's a bug to be fixed instead.

never noticed this bug. (I use MagicPython on sublimetext)

@ambv I was trying to write the statements neutrally. By "prefers" I just meant they have a significant majority of one type, which is an objective measurement. I've updated to use more objective wording ("mostly uses"). I'll update the statement about Guido, thanks for finding that out!

And yes, I personally dismiss the Sublime Text argument also. But I am reserving my opinions for a separate comment; I was trying to list all the arguments without stating any opinions yet. I will soon :)

ambv commented

Double quotes create more visual noise.

If that were true then written English prose wouldn't standardize on using them for citation. In fact, double quotes predate single quotes, suggesting that the visual distinction has a practical root.

When I see this argument, I wonder if it doesn't come from badly designed fonts more than anything else. After all, what is the unit of visual noise? What's the numeric difference in visual noise between the quotes? Even if we could answer those silly questions, I bet the delta in noise is miniscule, similar to how M is more noisy than N and commas are more noisy than periods.

I give very little weight to arguments based on precedent in existing code (including PEP8, popular PyPI projects, the Python REPL). Black is (AFAIK) the first Python autoformatter to standardize quotes, and the considerations for auto formatting are different than for hand-formatted code, so there is no relevant precedent.

Personally I use mostly single quotes in hand formatted code (single quotes for โ€œmachine stringsโ€, double for human-readable), but this is not a heuristic that can reasonably be implemented by an auto formatter, and given the limitations of automatic quote enforcement, I think double quotes are the better choice.

The โ€œhow hard is it to typeโ€ consideration carries much less weight with auto formatting (because you can type whatever you like and let Black convert, just like you no longer need to worry about manually breaking lines as you type), and the maintainability issues with inconsistency resulting from human readable strings with embedded single quotes are more serious. Nobody would hand format a list of UI strings with inconsistent quoting based on the incidental presence or absence of an embedded single quote (unless they were allowing a strict style guide to overrule their better judgment) but this is what Black would have to do.

And absent data, I consider โ€œdouble quotes are visually noisy / hard to readโ€ to be a subjective aesthetic preference that carries zero weight.

@o0i1 that is a description of what is used right now, not an argument why it should be used. Why are there so many people here that cant tell that difference?

A naive attempt to answer this using data. Many caveats apply, this is a dumb count of occurrence of quotes in .py files in the 100 most downloaded packages off of pypi.

https://gist.github.com/kadrach/6b8911234d94a928fc999710b7ff0e4c

Density plot of quote bias in top100 packages.
Bias calculated as num_single/(num_single+num_double)
    ++----------+----------+----------+-----------+----------+-+
    |                                 **********               |
    |                               **         **              |
    |                             ***            **            |
1.5 +                            **               **           +
    |                           **                 **          |
    |                          **                   *          |
    |                         **                     *         |
  1 +                        **                      **        +
    |                       **                        **       |
    |                     ***                          **      |
    |     ******         **                             **     |
0.5 +  ****    ***** *****                               ***   +
    |  *           ***                                     **  |
    ++----------+----------+----------+-----------+----------+-+
     0         0.2        0.4        0.6         0.8         1

This does look like a clear preference of one style over the other, but as mentioned before, many caveats apply (e.g. counting vendored code, potentially counting generated code, selection bias in top100 packages, etc).

ambv commented

@kadrach, this is interesting. Your counting method is invalid though, it counts apostrophes within other strings and comments. Please use an AST.

Could we do Top 1000 instead and only count within unique files?

@ambv Agreed, I was trying to avoid the work of doing this via an AST. I do wonder how much quotes contained within quotes would affect the outcome. I did deliberately want to include comments as "part of the code".

I'm not sure I understand counting within unique files. I would like to exclude vendored code, but then the problem of how to identify vendored code arises.

@kadrach Counting only unique files would solve the vendoring problem for the most part as vendored code is usually copied in-place.

@SethMichaelLarson I'm not quite clear on how to achieve this. How do you determine which project a file is attributed to? Has someproject vendored pip, or has pip vendored someproject? Counting only files unique across all projects would weight the counts heavily, disregarding packages that are more commonly vendored (perhaps pip as an example?).

I'm not sure this needs that level of detail, more to protect over-accounting for packages like urllib3 which is commonly vendored and which is also previously vendored by requests which is ALSO commonly vendored. :P

That's true, but we do need to count e.g. requests individually. If we only use unique files, choices that requests has made will be discounted (given most of the "requests" files will be non-unique and ignored); and requests is likely to be among the top packages.

ambv commented

Create a dict where the key is the sha1 of the file content and the value is the file content. Download all packages first while building this dictionary. Later you can use black.lib2to3_parse() to get the CST of the content, subclass black.Visitor to pluck out all strings and check the opening few characters to classify the string.

It doesn't matter which package a file comes from. It's super unlikely for unrelated files to match hash-wise so you're not "discounting" requests, you're just counting it once. In fact, you'll count it more than once since some vendored libs will be outdated. But that's a way better approximation than your first attempt.

I think I still need to retain a project to file association, to determine a "quote style choice" at project level.

ambv commented

No, that would unreasonably bump the value of a "style choice" for projects with small codebases compared to projects with many files. We don't care what a "project" chose, we care what all of the unique source code in the top 1,000 projects is using.

Weighting by number of files would bump projects with many little files over projects that opted for a smaller number of larger files. That seems unreasonable. Perhaps by number of lines?

My original intent was to figure out what the "unspoken standard" choices of the topmost projects are, not what all of the unique source code is using - I'm not convinced of that measure.

The biggest issue for me is that, for the people who use single quotes only as identifiers, running Black on a codebase actually loses information. It's not AST information, but it's information in the sense that a code comment is information.

ambv commented

For the people who use single quotes only as identifiers, running Black on a codebase actually loses information.

@landtuna, while I am skeptical if such scheme can be rolled out consistently, I acknowledge this is a problem.

kbd commented

The strongest argument in this thread for providing an option to not change quotes for strings (except for docstrings, which should always be """ per PEP 257) is that many people (including Guido) use quote style as a "extra-terse comment" to convey meaning, with single quotes meaning "data" and double quotes meaning "human-readable string".

I checked through my code and found I do the same thing without ever consciously deciding on that. I suspect this is common. I think this is actually why standardizing on double quotes bugged me (and perhaps other people?) so much and I couldn't explain why.

ambv commented

@kbd, agreed, this is the most convincing argument for allowing Black to optionally skip normalizing quotes as I suggested a few days back. That would cover all people unhappy with double quote enforcement.

you mean something like that ?

['my_key'] = "my string"

instead of

["my_key"] = "my string"
lig commented

@ambv I'm strongly voting for single quotes as approved code style. I've seen this at a lot of companies and discussed several times at different meetups. It is a widely spread practice to use single quotes for everything that is a code and double quotes for doc strings.

In fact, this is very useful in IDEs and editors where one could differentiate in-code strings and doc strings between each other via defining different coloring for them.

Honestly, I was a bit shocked after black converted all single quotes to double quotes resulting in much less readable code in an IDE.

kbd commented

@ambv What do you think about an enforced quote style of "strings containing no whitespace are single-quoted while strings with whitespace get double quotes"? That would be a consistent style (i.e. the same file would always have the same output) that may produce the smallest delta from how many people actually prefer to write Python.

I could definitely see adopting that enforced style, whereas I would personally never want to use forced double-quotes.

lig commented

@kbd sounds like something that is the best from different worlds. This should work for the case with ' for code, "" for docs and also ' for data, "" for human-readable values most of the time. A nice trick as for me.

For another data point: My team is about to reformat all our code with black. We've always used single quotes. We've got some very experienced Python folks. Everyone is fine with it.

@harveyr yeah us too.

Iโ€™m fine w. whatever, including a toggle-option that we as a community can possibly discuss and revisit in due time. Perhaps a toggle option really is the suitable solution to not fracture whatever community this project is accumulating.

Still not sure what makes us so different from StandardJS that does insist on a unified quote style ๐Ÿคทโ€โ™‚๏ธ But I also donโ€™t subscribe to the โ€œpublic strings as double-quotesโ€ strategy, itโ€™s much too fragile for my tastes (Iโ€™d push public strings through a translation function before using semantically meaningless quotes)

ambv commented

What do you think about an enforced quote style of "strings containing no whitespace are single-quoted while strings with whitespace get double quotes"?

@kbd, I find this curious but too magical :)

kbd commented

But I also donโ€™t subscribe to the โ€œpublic strings as double-quotesโ€ strategy, itโ€™s much too fragile for my tastes (Iโ€™d push public strings through a translation function before using semantically meaningless quotes)

The discussion hasn't been about "public" strings, but "human-readable" strings (i.e. natural language vs code). It's not necessarily about user-facing strings. For example, the vast majority of all (non-docstring) quotes in my code are single quotes because most strings in code are data (dictionary keys, names of things, shell values, filenames, etc.), but things like log messages and exception strings get double quotes. As far as "semantically meaningless" goes, many people in this thread have already explained the meaning they assign to different quotes.

I find this curious but too magical :)

Not arguing with you (since that's not an argument ;) but what if that small heuristic corresponds to what most people actually do and expect when reading Python code? What if in practice that results in the least disruption on existing codebases? What if it allows a fully-automated/consistent style that can be used by people who would otherwise reject double quotes everywhere? Would it be worth consideration?

Edit: I'd imagine this formatting would even catch bugs. "Why did this dictionary key / filename get double quotes? Oops there's a space".

One more argument to make it more configurable than it is now, is that "git blame" becomes of less use for older commits, as the majority of the changes black does on our code is changing quote style from single to double

ambv commented

@ikatson, agreed. In general, I should add this to the README:

  • you can use git hyper-blame and/or git blame $BLACK_REV^ -- $FILE to skip over the formatting commit

That is useful beyond the string quotes.

ambv commented

Resolution: 18.6b0 has --skip-string-normalization, or -S for short. This allows existing projects to adopt Black with less friction and works for all alternative string quotes policies.

Black still defaults to and recommends normalizing string quotes to double quotes everywhere as we believe this is a better default than single quotes, and is enforceable unlike the "single quotes for data, double quotes for human-readable strings" policy.

I hope this resolves to your satisfaction what's been the most controversial issue in Black's history.