rust-lang/rustfmt

Gives up on chains if any line is too long.

ehuss opened this issue ยท 41 comments

ehuss commented

If there is a method call chain, and just one call exceeds max_width, it seems like rustfmt gives up and doesn't format anything.

Example:

fn f() {
    foo("This text is under the max_width limit, and shouldn't cause any problems on its own.").long("But this line is extra long, and doesn't fit within 100 max_width. 1234567890123456789").baz().collect().unwrap();
}

No format changes will be applied to this. I'm also a bit surprised that error_on_line_overflow does not complain.

I would expect it to wrap each call chain element, even if one of them is too long, such as:

fn f() {
    foo("This text is under the max_width limit, and shouldn't cause any problems on its own.")
        .long("But this line is extra long, and doesn't fit within 100 max_width. 1234567890123456789")
        .baz()
        .collect()
        .unwrap();
}

All default settings.
rustfmt 1.4.8-nightly (afb1ee1 2019-09-08)

Yeah, AFAICT that's because rustfmt is expecting to be able to rewrite each of the individual ChainItems to be able to format the Chain. When it attempts to rewrite that long string literal in the Chain it fails due to the length which results in the entire original chain being left as-is.

rustfmt/src/chains.rs

Lines 419 to 444 in a15e97f

fn rewrite(&self, context: &RewriteContext<'_>, shape: Shape) -> Option<String> {
debug!("rewrite chain {:?} {:?}", self, shape);
let mut formatter = match context.config.indent_style() {
IndentStyle::Block => {
Box::new(ChainFormatterBlock::new(self)) as Box<dyn ChainFormatter>
}
IndentStyle::Visual => {
Box::new(ChainFormatterVisual::new(self)) as Box<dyn ChainFormatter>
}
};
formatter.format_root(&self.parent, context, shape)?;
if let Some(result) = formatter.pure_root() {
return wrap_str(result, context.config.max_width(), shape);
}
// Decide how to layout the rest of the chain.
let child_shape = formatter.child_shape(context, shape)?;
formatter.format_children(context, child_shape)?;
formatter.format_last_child(context, shape, child_shape)?;
let result = formatter.join_rewrites(context, child_shape)?;
wrap_str(result, context.config.max_width(), shape)
}

I think I see a way we could support still wrapping chain elements like your expected output, though it would mean rustfmt emitting a line that exceeds the max_width (although the original line does too, and the wrapped version is more human friendly IMO even with the one longer line).

@topecongiro @scampi - any concerns with that? if not, then I'll take a look at implementing something for consideration.

@calebcartwright sounds good to me!

I think this issue is duplicate of #2970 .

Using format_strings pretty much fixed it for me as strings get broken up and rustfmt can keep the line length below max_width.

I just ran into this as well. My code:

fn main() {
    clap::App::new("foo")
        .subcommand(
            SubCommand::with_name("bar")
                .about("some text here")
                .setting(AppSettings::ArgRequiredElseHelp)
                .arg(
                    Arg::from_usage("-l, --latest 'some very long help text goes here [default: default version]'")
                                               .conflicts_with(  "template_version"  )
                )
        )
        .get_matches();
}

rustfmt seems to completely give up after the line with the long help text, and doesn't even try to reindent the .conflicts_with, or fix the spacing in its argument, or add a trailing comma in either of the two places that should have trailing commas.

I'd expect rustfmt to do best-effort on the long line, and then continue afterward.

Doing something about this is still on my to-do list, though I do think it's worth expanding a bit on what folks have in mind when we say "best effort".

It seems this is most often encountered with a single long string, but will also note there are plenty of other scenarios that can run into this where IMO the expected behavior starts to get a little tricky. Consider for example a chain whose parent starts in a heavily indented position, and the chain element that exceeds the max width value is a closure param that itself has a sizeable body with additional nested indentation.

Would users still want rustfmt to format that closure anyway even though it blows way past the max width value? Are we changing the meaning/honoring of max width with a caveat of "except within chains"? Should rustfmt do this by default or should users have to explicitly opt-in to allowing rustfmt to exceed the max width when there are chains involved?

@calebcartwright In general, I'd expect rustfmt to indent everything to the correct level, and try to wrap appropriately at line-width, but whether it's successful or not, it should continue on to wrap the rest, yes.

It's not "except within chains", it's "when possible, without violating more important constraints like properly indenting". If you have an indent-width of 4, and (say) 15 levels of indentation and a 50-character function name, you cannot format that without exceeding 100 characters, and that shouldn't stop you from continuing on to format more of the line. And if you need to wrap the function parameters, those should still get indented 16 levels, even if one of them is a long string.

more important constraints like properly indenting

What defines the relative importance of one constraint vs. another though? Is there a consensus on which constraints can be violated? Does the style guide have a prescription? The only thing I'm familiar with (and I know you're far more familiar with the style guide than I @joshtriplett ๐Ÿ˜„) is https://github.com/rust-dev-tools/fmt-rfcs/blob/7416e12356a55aae959e9629b58eb7c7c3c86f6f/guide/guide.md#indentation-and-line-width

It's not that we can't technically make this change, but in these types of scenarios where rustfmt can't satisfy all the constraints it bails and defers to the programmer to format, or refactor, as needed. This is usually quite manageable, and often accompanied with advice like refactor your code to avoid long/complex expressions, usually by extracting a local variable or using a shorter name

If you have an indent-width of 4, and (say) 15 levels of indentation and a 50-character function name, you cannot format that without exceeding 100 characters

Precisely and agreed, I was just trying to keep the example in the context of chains given the issue.

There's also portions of the user base that do want to strictly enforce the width limits in their code and would leverage options like error_on_line_overflow to ensure that any violations of the width constraints were indeed raised, and any changes to the behavior would have to continue to honor that (an implementation detail that I'm just noting for future reference).

I also think that if max width were to be changed to more of a soft constraint then we'd need to explicitly convey that rustfmt will format your code within the limit, unless it really can't, in which case it will format out indentations indefinitely as needed.

I very much agree with Josh here, but I think even bigger issue is that there are no visible errors/warnings that formatting is not possible and why.
It just silently gives up, leaving user guessing why this particular part of the code is not formatted correctly.

@nazar-pc try running with --config error_on_line_overflow=true which exists for precisely this purpose (make sure you're on nightly), it's just disabled by default. If you'd like to make a case for that being enabled by default then #3391 would be the place to do so

example output with that enabled:

error: line formatted, but exceeded maximum width (maximum: 100 (see `max_width` option), found: 115)
 --> <stdin>:8:8:101
  |
8 |                     Arg::from_usage("-l, --latest 'some very long help text goes here [default: default version]'")
  |                                                                                                     ^^^^^^^^^^^^^^^
  |

warning: rustfmt has failed to format. See previous 1 errors.

Another bug caused by this issue, that makes my editor's formatter, which uses rustfmt, anrgy (I'd imagine many more like this, it's happened to me in match statements as well):

Input:

fn main() {
    (
        foo, 
        "this is a long line that breaks rustfmt, let's make it even longer foo bar foo bar foo bar"
    )
}

Note the trailing space after foo,

Output:

error[internal]: left behind trailing whitespace
 --> /playground/src/main.rs:3:3:12
  |
3 |         foo, 
  |             ^
  |

warning: rustfmt has failed to format. See previous 1 errors.

Playground Link

@MonliH - that's not really a separate bug or strictly related to this issue. Your original snippet has a trailing space which is the root cause (rustfmt isn't inserting a random trailing space), rustfmt is just leaving your original snippet as is and highlighting the fact that it did not remove the trailing spaces.

Probably the same underlying issue:

pub(crate) enum S {
    A,
}

impl std::fmt::Display for S {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match
        &self
        {A=>write!(f,"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"),
        }
    }
}

is left untouched

Any progress on this issue? It's quite a significant one considering just about every Rust binary uses clap, in which long chains like this is a necessity.

just about every Rust binary uses clap, in which long chains like this is a necessity.

If you're using structopt instead (which builds on clap), then you can use doc comments on your structs for the help strings, which avoids that particular need for long strings.

benma commented

It seems the surrounding code around a long string is also not formatted. This struct instance for example is not formatted at all:

fn main() {
    struct Foo {
        a: u8,
        b: &'static str,
    }
    let foo=Foo { a:1, b: "foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo" }; 
}

It would be great if the rest was still formatted, just ignoring the string itself.

I would expect it to be formatted like this:

fn main() {
    struct Foo {
        a: u8,
        b: &'static str,
    }
    let foo = Foo {
        a: 1,
        b: "foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo foo",
    };
}

The format_strings workaround does not help if the string has no spaces, which is frequently the case for me (comparing byte or hex strings in unit tests):

    let foo=Foo { a: 1, b: "foofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoofoo" };

Progress on this would be very appreciated, a lot of my unit testing code can't be formatted due to this bug.

A work around that can be beneficial in certain cases is to use the include, include_str, and include_bytes macros.

I appreciate those that have commented here, ostensibly in an attempt to avoid opening duplicative issues. However, I feel like in doing so the dialog has started drifting off topic.

As a reminder, rustfmt is strictly a pretty-printer style formatter that deals directly with the AST, so even if you experience similar symptoms from a different type of input AST node/syntax, it's still a fundamentally different root cause that will almost always need a different investigation and resolution. I'd encourage folks in such a scenario (e.g. long strings but on a struct instead of within a chain) to look for/open a separate issue that's based on your input, and simply add a link to other issues with similar symptoms if you think they may related.

Finally, while we do have some options available moving forward (which I'll enumerate in a separate comment), I want to stress that this current behavior is unequivocally not a bug. It is intentional, it has always been the default, and it will continue to be the default.

The decision was made long, long ago to have width boundaries and associated wrapping as a core part of rustfmt when determining what the output should be for a given AST node, and in cases where it is completely impossible for rustfmt to honor the width settings (which the user controls, implicitly or explicitly via max_width) it is up to the user to refactor their code accordingly; rustfmt can't just categorically run past your width boundary and carry on formatting any type of AST node without any concept of bounding constraints. That's especially relevant for chains/calls because they frequently have more complex structures beyond string lits (e.g. closures, nested chains, etc.) containing other AST nodes that also have to be formatted according to the same rules.

With the above being said, I have finally managed to set aside some time to dig into this and believe I see a viable path forward.

As noted previously the current behavior will remain the default, but I'm planning on introducing a new config option that will provide some additional variants users can leverage to control chain formatting behavior. That would include one variant that will perform the standard chain formatting behavior, but unlike the default behavior, it will attempt to make a best effort to continue formatting chains which in part exceed the defined max_width value as requested here.

We'd also most likely want to encourage users opting for this non-default behavior to strongly consider pairing it with other options like error_on_line_overflow (which we'll be adjusting to include off/warn/error mode variants and then hopefully stabilizing) so that you'll be aware of cases where your formatting result exceeded your defined value for max_width.

There's only two feasible approaches I can see for handling an element within a chain that must exceed max_width. I'd like to run a poll of sorts to gauge interest on those, so I'll details those two in separate comments (options a and b below) and ask folks to use reactions on those comments to provide their feedback.

Note that these aren't mutually exclusive and I'm open to supporting both if there's interest. As such I'd ask folks to primarily utilize the "thumbs up" to indicate favor/interest, and only use the "thumbs down" if you adamantally feel the option shouldn't even exist (don't downvote it just because you wouldn't use it yourself).

I'd also ask that folks refrain from requesting other options on this issue. The original use case posed in the issue is pretty straightforward and the solution I have in mind will present a resolution to this. It will also create a framework of sorts through which additional variants/formatting behaviors can be added down the road, each of which should be proposed/discussed independently in a separate issue.

Finally, in cases where a chain in a right hand side (rhs) position (e.g. an assignment) is too long fit within the boundary with either shape (i.e. same line as lhs/assignment operator vs. next line block indented), rustfmt will have to try to pick where to put the chain, and I'm thinking that we should base that on which of the two will minimize the amount of formatted chain that ends up on the right of the defined max_width boundary. There's not a ton of alternatives to that which come to mind for me, but if anyone has questions/alternatives to that particular piece feel free to ask/suggest.

Option (a)

Use the original input for that element (whatever the user originally wrote), with some minor adjustments like indentation of the start of that element, and trimming trailing spaces.

This would directly solve the case originally reported in this issue with the desired format being produced for the provided input, as well as most other input snippets reported throughout the thread. It's also something we could get out on nightly fairly quickly, potentially before the end of the month. However, it likely wouldn't help all that much in cases where the exceeding chain item is caused by a more complex arg to a chained call, e.g. a closure with various other statements/expressions/etc., as that sub-content wouldn't be processed

Option (b)

Attempt to dynamically derive and then use a formatting shape with an increased width (which exceeds max_width) which will be used for that specific chain element.

The reason this would be necessary is to provide some required constraints when formatting those subexpressions to avoid horrific formatting that would try to collapse/single line everything out lines to infinity. I'd envision this operating along the lines of:

  • if impossible to fit the chain element within the max_width of 100
  • then create a scaled up width for this particular chain element (we could allow user control over this, e.g. long_chain_scale_factor = 20%, or rustfmt could attempt to guess one based on what it sees in the AST)
  • repeat as necessary, with some max number of scaled retry attempts (which again could be user controllable)

This would be more complex, both from a user experience/mental model and implementation, but if we get it right would probably be much helpful for that cases with more complex args within chained calls.

detly commented

Minor suggestion: would relating the scale "factor" to the indentation used be a sensible thing (or just making it the indentation)? (Factor is in quotes because it would change it into an additive process, not a multiplicative one.)

Clarification: would the "scaled retry attempts" reapply the chain scale factor as a multiplier (eg. 100, 120, 144, 172) or add the same 20% of the original max_width (100, 120, 140, 160)?

Minor suggestion: would relating the scale "factor" to the indentation used be a sensible thing (or just making it the indentation)? (Factor is in quotes because it would change it into an additive process, not a multiplicative one.)

Clarification: would the "scaled retry attempts" reapply the chain scale factor as a multiplier (eg. 100, 120, 144, 172) or add the same 20% of the original max_width (100, 120, 140, 160)?

Great question! The short answer is that I think we can experiment and try to flesh out the specifics if/when we get around to implementation.

However, no it's not about letting the user define indentation. Indentation is based on hierarchical structure, config options like indent_style, etc. and comes into play based on the hierarchy and when lines are wrapped (which has an upper bound based on width); it doesn't really matter whether width is exceeded due to indentation/nesting levels or code content.

Fundamentally, the ideal goal would be to try to find a goldilocks width (beyond the max_width) that's just big enough to support formatting the too-long chain content contained within. We want to avoid really strange scenarios where we go too big because that will produce borderline contradictory formatting where chain content to the left of the max_width column is wrapped but content to the right is flattened/unwrapped. To get a feel for how poor the latter could be if too big, you could run something like cargo fmt -- --check --config max_width=500 on any non-trivial/hello world style codebase.

Using format_strings pretty much fixed it for me as strings get broken up and rustfmt can keep the line length below max_width.

This only works with regular strings. Long raw strings still cause the problem.

BTW, it would be very helpful if there was a mode or cli flags that I could use to identify lines that are too long for rustfmt. I currently have a 100 line file that builds up a clap::App (and so it's ultimately all one big chain). rustfmt can't format the entire file, but it doesn't give any indication about what line(s) need to be manually shortened.

Edit: I spoke too imprecisely when I said that rustfmt can't format "the entire file", since this is not true. Only the long chain is unformattable, but in my particular case, this chain constitutes nearly the entire file, hence my remark. I'm sorry about this.

BTW, it would be very helpful if there was a mode or cli flags that I could use to identify lines that are too long for rustfmt.

Discussed above (e.g. #3863 (comment)) and in a few other threads, but you can turn on error_on_line_overflow and error_on_unformatted for this.

rustfmt can't format the entire file, but it doesn't give any indication about what line(s) need to be manually shortened.

This is a topic that pops up frequently so I'll reiterate here that it's important to distinguish between what rustfmt does vs. what editor plugins do, especially since we have no influence/insight into what those various plugin teams do. The decision to not apply any formatting to an entire file in some scenarios is behavior that some editor plugins have opted to apply, and is not related to rustfmt behavior in any capacity.

In these unformattable scenarios rustfmt only leaves the the top level expression as you wrote it, but will absolutely format everything else in the file.

Additionally, the aforementioned config options, though disabled by default, will highlight your lines that are too long. If you have a scenario where that's not the case (when running rustfmt directly, not editor behavior) then happy to take a look but would ask a separate issue be filed with steps to reproduce.

Thanks, I confirm that using both error_on_line_overflow and error_on_unformatted does what I want. I'm sorry for not having properly read the previous comments about this, that was wrong of me. I've also edited my above comment to clarify my comment about "the entire file".

No worries! It's a rather long and fairly old thread at this point so just wanted to note there were some other posts that drilled deeper into the details since I was too lazy to fully recap ๐Ÿ˜†

Ha I have this exact problem and I was pulling my hair out trying to figure out what was going on.
Is there any plan to fix that or any clear work around that would allow long strings without spaces?

Edit: correction, it does not stop formatting altogether, but as some items/language constructs are fairly large it can "ignore"/not format a fairly large chunk of code

sorry to stirr things up here, but it seems rustfmt is expected to format lines that do not exceed the max line length

In these unformattable scenarios rustfmt only leaves the the top level expression as you wrote it, but will absolutely format everything else in the file.

It's not the behavior I'm seeing, we have frozen our tooling toolchain to nightly-2022-07-05 and once a line goes overboard the entire file gets ignored for formatting while trying to run

cargo +nightly-2022-07-05 fmt

with rustfmt.toml

unstable_features = true
imports_granularity="Module"
format_code_in_doc_comments = true
wrap_comments = true
comment_width = 100

Unless I misunderstand the meaning of the answer quoted above ?

Cheers

Seems weird that it tries to put as much as possible on 1 line just to break itself. E.g. here formatting doesn't work, nothing is reported with error_on_line_overflow

{
    t(|ctx, input: VeryLongInputName| async move {
       None
    })
}

If I shorten the struct name a bit then it starts working and it puts even more stuff on that problematic line. Took me several hours to pinpoint this specific place in the long chain of methods that are needed for rspc to create the procedures handler.

{
    t(|ctx, input: VeryLongInputNam| async move { None })
}

This actually reports the very long line error.

{
    t(|ctx, input: VeryLongInputName| async move { None })
}

@radoslavk could you open a new issue with the complete snippet that was causing you problems

Hi! I think i ran into this same issue but i'm not sure, since this code doesn't have method chains but instead involves a match statement.

So i was converting some nested if/else expressions into a match. At some point in the process, the code looked like this:

pub fn normalize_unicode(word: &str, nf: &str) -> Result<String, String> {
    match nf.to_lowercase().as_str() {
        "nfc" => {
            Ok(word.nfc().collect())
        },
        "nfd" => {
            Ok(word.nfd().collect())
        },
        "nfkc" => {
            Ok(word.nfkc().collect())
        },
        "nfkd" => {
            Ok(word.nfkd().collect())
        },
        _ => {
            Err("Unknown Unicode Normalization Form received in arguments.\nPlease use one of the following normalization forms: nfc, nfd, nfkc, or nfkd.".to_string())
        },
    }
}

And i was sure that rust-fmt was going to get rid of those unnecessary curlies on the match cases and collapse them into single lines upon ctrl+Saving. But nope. In fact i could make any formatting atrocity inside any of the match case bodies and the code would not get auto-formatted.

I'm a bit a embarrassed to admit that it took me quite some time to figure out that the problem was due to the long Err message at the end of the function. I thought nothing was being formatted on the whole file! Or that something was wrong on my setup.

Once i trimmed the long-ish string inside the Err(), the code got formatted properly:

pub fn normalize_unicode(word: &str, nf: &str) -> Result<String, String> {
    match nf.to_lowercase().as_str() {
        "nfc" => Ok(word.nfc().collect()),
        "nfd" => Ok(word.nfd().collect()),
        "nfkc" => Ok(word.nfkc().collect()),
        "nfkd" => Ok(word.nfkd().collect()),
        _ => Err("snip...".to_string()),
    }
}

So, i have two questions:

  1. Is this the same issue reported here, or something different that looks similar?
  2. If a general fix for this is too complicated, could at least the condition of bailing out on long strings be more local? For example, in the snippet above, could it only affect the formatting of the last block on the match, leaving only that block unformatted, but still formatting the rest of the non-offending blocks normally?

I think a non-general fix like that, if possible, could be very useful, both for having the formatter bailing out on fewer lines of code, but also for actually noticing where exactly it's giving up.

So, i have two questions:

  1. Is this the same issue reported here, or something different that looks similar?
  2. If a general fix for this is too complicated, could at least the condition of bailing out on long strings be more local? For example, in the snippet above, could it only affect the formatting of the last block on the match, leaving only that block unformatted, but still formatting the rest of the non-offending blocks normally?
  1. No, it's separate. There's various other comments above reporting different-but-with-similar-symptom issues, but they're likely buried because of the volume of commentary, see #3863 (comment) for more
  2. No, no general fix, see same linked comment. Also don't want to add any additional noise on this thread discussing the reasoning or potential changes for a different issue that would have no impact on this issue.

You do not need strings to get this bug:

// this does not format
let offenders = current_validators
    .into_iter()
    .enumerate()
    .filter_map(|(_, id)| 
    <<Runtime as pallet_im_online::Config>::ValidatorSet as ValidatorSetWithIdentification<
        sp_runtime::AccountId32>>::IdentificationOf::convert(id.clone()).map(|full_id| (id, full_id)))
    .collect::<Vec<IdentificationTuple<Runtime>>>();
// this type alias prevents the formatting bug
type XXX =
    <<Runtime as pallet_im_online::Config>::ValidatorSet as ValidatorSetWithIdentification<
        sp_runtime::AccountId32,
    >>::IdentificationOf;

// this does format
let offenders = current_validators
    .into_iter()
    .enumerate()
    .filter_map(|(_, id)| XXX::convert(id.clone()).map(|full_id| (id, full_id)))
    .collect::<Vec<IdentificationTuple<Runtime>>>();

And the giving up propagates to the whole block.

I would like an option to have max_width be the width of the line not including the indentation, or a switch, like relative_max_width

How has this bug been open for almost 5 years without a fix?
"Just giving up without any error" is unacceptable behavior for a formatter.

Why don't you fix it yourself? ๐Ÿค”

i don't have the required programming experience and the rust skills yet

i don't have the required programming experience and the rust skills yet

So be respectful when you speak. Don't act like a self-entitled git (in the traditional sense of the word).

Another example where a closure is part of the chain:

fn main() {
    let f = foo("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")
        .bar(|| {
foo( 233545   );



             if   let Some   ( x)   = foo(3) {
   }
    });
}