rust-lang/rust

format_args!() could 'inline' literal arguments

m-ou-se opened this issue Β· 26 comments

Arguments::as_str allows code to handle argument-less fmt::Arguments like format_args!("literal") specially by not pulling in any formatting code.

We should investigate if we can improve format_args!("hello: {}", "literal") to result in the same trivial fmt::Arguments as format_args!("hello: literal"), such that any .as_str()-based optimizations are also available for it.

That could also make things like panic!("{}", "message"); work with const_panic, removing the need for panic!(CONST_STR).

This makes panic!("hello") exactly as cheap (and const) as panic!("{}", "hello"), which is also important for #78088 to make sure there are no downsides to its suggestions.

It'd also reduce the need for using concat!(..), since it'd no longer be less efficient to add a {} placeholder to concat literals in a formatting string instead.

As a bonus: format_args!("a: {}", format_args!("b: {}", ..)) could maybe also improved to produce the same as format_args!("a: b: {}", ..).


Assigning to myself to investigate some time Soonβ„’.

Do we expect this kind of thing to actually happen right now? It seems like we should if anything just lint against it, and the workaround for the single-arg-panic string could be to do panic::panic_any("{}") instead of panic!("{}", "{}").

Do we expect this kind of thing to actually happen right now? It seems like we should if anything just lint against it, and the workaround for the single-arg-panic string could be to do panic::panic_any("{}") instead of panic!("{}", "{}").

would that actually work in #![no_std]? I don't think there's a core::panic::panic_any...

Do we expect this kind of thing to actually happen right now?

If you're asking if anyone has time to implement this: Yes, I believe I can find the time for this some time soon.

the workaround for the single-arg-panic string could be to do panic::panic_any("{}") instead of panic!("{}", "{}").

panic_any will only exist in std, not core. The bloat problem is mostly a problem in no_std programs.

If you're asking if anyone has time to implement this

Sorry, I'm asking if we expect that something like println!("hello: {}", "world") actually exists in codebases in significant numbers.

panic_any will only exist in std, not core. The bloat problem is mostly a problem in no_std programs.

Suggesting core::panicking::panic would be the thing then I suppose.

I'm asking if we expect that something like println!("hello: {}", "world") actually exists in codebases in significant numbers.

Ah. I think it mostly happens as the result of macro expansion. E.g. error!("x") expanding to eprintln!("error: {}", "x"). Though, many macros use concat!() and panic(thing) directly. E.g. println!("Failure: {}", stringify!(expr)) is sometimes println!(concat!("Failure: ", stringify!(expr)) (which then breaks for expressions with braces). Or panic!("error: {}", format_args!(fmt, args...)) is sometimes written panic!(concat!("error: ", fmt), args...), etc.

Suggesting core::panicking::panic would be the thing then I suppose.

That one requires a 'static lifetime, so then the suggestion would be different for 'static strings and non-'static strings. Would be nice if it didn't require a special case. (It'd also require that function to become stable.)


Anyway, I opened this issue just as a reminder to investigate the possibility, not because I think we should definitely do this. There are good ways to avoid the need for this like you mention (core::panicking::panic), but it might be nice if no special cases were necessary while implementing macros like assert!(), and "{}", ".." was just as efficient as any other solution.

My observations have consistently suggested that people use format! or write! for seemingly "trivial" reasons All The Time because when you start with println! it's what feels like the most natural and easiest way. And then they do wrap it in other macros, yes, so it is best to try to take the effort to reduce the amount of overhead going on here.

Hey @rust-lang/wg-const-eval, I'd like to hear your thoughts about the feasibility of this feature.

Some examples of things it could do:

  1. format_args!("a {}", "b") β†’ format_args!("a b"): Implementing this does not need constant evaluation. The format_args macro itself can directly see the literal and inline it.
  2. format_args!("a {}", SOME_STR) β†’ format_args!("a b"): The macro can't see the contents or type of the constant (or even see that it is a constant), so this will need to happen at some other point. How could this work?
  3. format_args!("a {}", 123) β†’ format_args!("a 123"): The macro could do this, but will have to make assumptions about how integers are formatted. Can work for simple cases. More advanced cases would need constant evaluation of the Display trait, but I'm guessing that's still far away.
  4. format_args!("a {}", SOME_INT) β†’ format_args!("a 123"): Just like 2, requires constant evaluation after the macro is already expanded, and like 3 requires knowledge about how to format integers.
  5. format_args!("a {}", format_args!("b")) β†’ format_args!("a b"): The macro could do this directly if it can know that the format_args token actually refers to itself. (I think it can, since it can also expand concat!() etc.)
  6. format_args!("a {}", format_args!("b {}", c)) β†’ format_args!("a b {}", c): Same as the previous one, but needs more complicated logic.

For const_panic specifically, 2 would be very useful to have panic!("{}", MSG) work (to replace panic!(MSG) which won't work in Rust 2021). But 4 could also be very useful to give more useful panic messages during constant evaluation.

Many of these I could implement by modifying how the format_args builtin macro works, but that won't work for the cases where it refers to constants instead of only using literals, which are probably the most useful ones for const_panic.

So my question is: Can that be implemented without making things extremely complicated? Recognizing and reversing+rebuilding the fmt::Arguments::new call that results from the format_args!() invocation when processing the mir sounds complicated to me. Is there an easier way? Could we add special metadata to make it easier to process at a later stage? Could we add a new expression type specifically for 'format args' such that it doesn't appear as a big complicated call expression to fmt::Arguments::new, but simply as a 'format args expression' that gets substituted a fmt::Arguments constructor call much later? Or any other ideas?

The problem isn't format_args at all. We can make new_v1 const fn today, even new_v1_formatted. The problem is panic_fmt which wants to format the Arguments thing properly. Since at this point the integers or strings to be formatted are only available via

pub struct ArgumentV1<'a> {
value: &'a Opaque,
formatter: fn(&Opaque, &mut Formatter<'_>) -> Result,
}
as part of
pub struct Arguments<'a> {
// Format string pieces to print.
pieces: &'a [&'static str],
// Placeholder specs, or `None` if all specs are default (as in "{}{}").
fmt: Option<&'a [rt::v1::Argument]>,
// Dynamic arguments for interpolation, to be interleaved with string
// pieces. (Every argument is preceded by a string piece.)
args: &'a [ArgumentV1<'a>],
}
, we can't really do anything at CTFE time, unless we start adding some more lang items. Most notably we'd need to make ::fmt a lang item, so we can check whether the pointer in ArgumentsV1 is a pointer to that function, and if it is, print that string to our compile-time-panic.

This scheme does not scale to anything but panicking, because if the formatting arg is not a string (or another potentially hardcoded type like integers), then we have to abort with an error. Since we are already panicking, we can just continue and print something like "{unprintable}".

I'd rather not add support for integers -- if we go down that route we will end up with a (hacky) const-side reimplementation of a subset of the formatting machinery. As much as possible, we should use Rust code for this, not special hacks in the interpreter engine.

Given that the CTFE core engine already supports the entire display machinery (demonstrated by the fact that it works in Miri), we might alternatively poke a small hole into the const checks that allows executing formatting machinery from panic_fmt only -- something like "miri unleashed", but more targeted.

The problem isn't format_args at all. We can make new_v1 const fn today, even new_v1_formatted. The problem is panic_fmt which wants to format the Arguments thing properly.

const_panic is not the only reason to this. I want .as_str() to return a Some(..) in more cases, such that a macro like this can be just as efficient as when it somehow had used used concat!() whenever possible:

macro_rules! log_newline {
    ($($t:tt)*) => {
        $crate::log_fmt(format_args!("log: {}\n", format_args!($($t)*)))
    }
}

Even if constant evluation could properly handle this nested format_args, it would not result in log_newline!("literal") avoiding an allocation in situations like this:

#[inline]
pub fn log_fmt(x: fmt::Arguments) {
    if let Some(s) = x.as_str() {
        log_str(s); // No allocation needed. The whole str is already available. \o/
    } else {
        log_str(&x.to_string());
    }
}

Ideally, log_newline!("literal") should result in a fmt::Arguments with a &'static str "log: literal\n" returned from .as_str(). This requires not just being able to display the fmt::Arguments during compile time, but also substituting it with a (partially) evaluated one.

Also, log_newline!("a {}", a) ideally shouldn't have to pay the cost of the extra layer of format_args!() to add the prefix and newline, as shown in example 6 above. (Otherwise we need hacks with concat!() to avoid that, which result in bad edge cases.)

Expanding a bit on this idea I mentioned above:

Could we add special metadata to make it easier to process at a later stage? Could we add a new expression type specifically for 'format args' such that it doesn't appear as a big complicated call expression to fmt::Arguments::new, but simply as a 'format args expression' that gets substituted a fmt::Arguments constructor call much later?

If I understand correctly, getting the values of constants needs to happen at the mir level. So, this idea would mean, I think:

  • Adding a ast::ExprKind::FormatArgs
  • Making the format_args! builtin macro not expand to a ast::ExprKind::Call("fmt::Arguments::new", ..) anymore, but to a ast::ExprKind::FormatArgs instead.
  • Lowering that to a hir::ExprKind::FormatArgs transparently.
  • Lowering that to a mir::??? transparently.
  • Transform it by inlining constants and flattening it, etc.
  • Lowering(?)/transforming it into a call to fmt::Arguments::new.

Then const_panic should just have the FormatArgs available witout having to decode/evaluate core's fmt::Arguments, after inlining constants and flattening. And .as_str() can use every optimization/transformation that happened at the mir level.

(Also, --pretty=expanded would get less noisy when using format args, because the macro would no longer expand directly to the full fmt::Arguments construction.)

I guess this is in some ways similar to what happens for asm!(..).

I am not sure I like the idea of making formatting a language primitive based on MIR primitives.

I guess this is in some ways similar to what happens for asm!(..).

Indeed, and I view that as a rather strong negative.^^ asm! really needs to be a language primitive, formatting really shouldn't.

I think the better comparison here are generators. We could expand generators on the HIR, it would just be a mess, so we do it on MIR where this is much more manageable. Formatting machinery stuff could be similar, but in any case, we could start with a minimal version that works as a HIR lowering similar to how for loops don't exist in HIR anymore.

Would implementing this fix #87466? It would be great if this were done before const_panic and/or edition 2021 is stabilized.

@AaronKutch in the issue description

That could also make things like panic!("{}", "message"); work with const_panic, removing the need for panic!(CONST_STR).

This doesn't really have anything to do with the edition, though. const_panic is still unstable for a reason.

Slowly making my way through some of the tickets for const_panic while investigating stuff. I'm a bit confused why this is so controversial. Surely, at least inlining string literals wouldn't be adding so much code we could consider it identical to reimplementing all of fmt as MIR optimisations.

I could definitely see making format_args!("hello {}", s).as_str().unwrap() a possible const operation as something useful.

I can come up with a few ways that can make it possible to make format_args! callable from const contexts.

The easiest ways will first make ArgumentV1::new a const function, then we can either

a) in const_check, somehow retrieve the trait from the second argument (FormatTrait::fmt function pointer), and validate the type parameter T implements const FormatTrait.

b) instead of using function pointers, use something else. F: Fn or some type that we define. The new function ensures that we can call the fmt function at compile time. (we could also just split it into different methods e.g. new_display, new_debug etc.) This might affect the performance of runtime though.

format_args! is already callable in const contexts. The issue is actually doing anything with the result.

You can't call format_args!. Not even when you only have the format string. (format_args!("a") does not work)

format_args!("a") does not work

That's on purpose, to avoid making that part of the stability guarantees for now. We have an unstable/internal const_format_args!("a") that does work, which is used by panic!("a").

some time Soonβ„’.

2Β½ years later, I finally got around to implementing it: #106824

Opened #106823 to change the documentation of fmt::Arguments::as_str() to allow for inlined/flattened format arguments.

Would inlining literal arguments allow format_args to be used in const contexts: #108595?

The inlining logic could play some role if we decide to allow format_args in const, but whether to allow format_args in const is an entirely separate discussion and decision.

This is now available on nightly with -Zflatten-format-args=yes.

This is now implemented and enabled by default in Rust 1.71.0: #109999