fmt could be faster if it didn't do char-by-char

Question

fmt could be faster if it didn't do char-by-char

travisdowns opened this issue 23 days ago · 3 comments

Core functions like fmt::format_to take an output iterator, which lets you write directly into almost any type of buffer, or even directly to a device or file, etc. However, it also means that formatting writes into the buffer character-by-character, even when fmt "knows" that it just needs to copy a bunch of chars from its internal buffers or the format string to the output. If the output iterator is not totally trivial (e.g., it may resize the underlying buffer if is exhausted), the compiler can't really optimize this complexity away.

Is there currently any way for the iterator to expose addtional methods (e.g., write_n which writes multiple chars to the buffer) that fmt would take advantage of?

Answer 1 · 2024-12-13T03:31:32.000Z

Looking at copy_str it does seem there is a specialization for when the output iterator is a char *: in that case memcpy is called to do the copy in one shot so this case at least avoids the char-by-char problem, but it is often hard to usefully use char * without knowing the maximum size of the formatted value.

I'm wondering if a a (hopefully constexpr) fmt::upper_length_bound("format string", o1, o2, ...) would be useful here: it could return a conservative upper bound on the length of strings formatted with the given format string and objects (or types). Evidently this could only work for types with an inherent upper bound (so numbers, but not strings, custom user types unless they opted in). This would open up more use cases for fixed buffers using format_to and char *, since the maximum buffer size can be bounded.

Answer 2 · 2024-12-14T16:26:30.000Z

Currently there are some optimizations for contiguous iterators and back insert iterators into contiguous containers (like vector or string). Is your iterator one of those or something more complex?

upper_length_bound is not implementable in general, only for a small subset of types and when dynamic specs are not used.