rust-lang/rust

assertion failed: bpos.to_u32() >= mbc.pos.to_u32() + mbc.bytes as u32

dwrensha opened this issue · 14 comments

rustc crashes on the following input, found by fuzz_rustc:

fn i(){println!("🦀%%%";r
error: this file contains an unclosed delimiter
 --> bug.rs:1:25
  |
1 | fn i(){println!("🦀%%%";r
  |       -        -         ^
  |       |        |
  |       |        unclosed delimiter
  |       unclosed delimiter

error: expected `,`, found `;`
 --> bug.rs:1:23
  |
1 | fn i(){println!("🦀%%%";r
  |                        ^ expected `,`

error: argument never used
 --> bug.rs:1:24
  |
1 | fn i(){println!("🦀%%%";r
  |                         ^ argument never used
  |
thread 'rustc' panicked at 'assertion failed: bpos.to_u32() >= mbc.pos.to_u32() + mbc.bytes as u32', compiler/rustc_span/src/lib.rs:1710:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.59.0-nightly (c09a9529c 2021-12-23) running on x86_64-unknown-linux-gnu

searched nightlies: from nightly-2021-01-01 to nightly-2021-12-24
regressed nightly: nightly-2021-10-02
searched commits: from aa7aca3 to c02371c
regressed commit: b6057bf

bisected with cargo-bisect-rustc v0.6.0

Host triple: x86_64-unknown-linux-gnu
Reproduce with:

cargo bisect-rustc --start=2021-1-1 --end=2021-12-24 --regress ice 

The regression happened in #89340. cc @FabianWolff.

@rustbot label regression-from-stable-to-stable

Assigning priority as discussed in the Zulip thread of the Prioritization Working Group.

@rustbot label -I-prioritize +P-low

I've been taking a look at this. It's pretty interesting. I'm not entirely sure I'll be able to work it out, but I figure I may as well assign myself for the time being while I attempt to solve it.

Another test case:

fn f(){(print!(á
fn f(){(print!(á

I'm somewhat confused. Are you getting the same error from this? I'm having trouble reproducing.

The problem seems to reside within the crate rustc_builtin_macros, in the files format.rs and format_foreign.rs. The function expand_preparsed_format_args in format.rs has a macro, check_foreign!, which is in charge of looking for foreign substitutions - that is, someone using printf or shell style formatting rather than Rust's style. It's a macro so it can be generic over the two types of substitution.

The first time this macro is called, for printf substitutions, it finds two (I think "%%" and "%", though I'm not sure it's exactly that). The code that actually detects the substitutions is in format_foreign.rs. When parsing the substitution, in the function printf::parse_next_substitution, it is somehow finding that the second substitution has a boundary in the middle of the 🦀 character. I'm not sure exactly how yet. This gets turned into a span in expand_preparsed_format_args. The span is malformed, as it splits a character in two, so the second the code tries to use it for anything it trips an assertion and causes this ICE.

The remaining problem is figuring out what's going wrong in printf::parse_next_substitution. This sort of string manipulation is not my forte. I'll probably take another few cracks at it though.

Something is broken about printf::Substitutions::pos.

This change makes the ICE go away (but probably breaks other things):

diff --git a/compiler/rustc_builtin_macros/src/format_foreign.rs b/compiler/rustc_builtin_macros/src/format_foreign.rs
index bfddd7073ff..3b9e9f76f45 100644
--- a/compiler/rustc_builtin_macros/src/format_foreign.rs
+++ b/compiler/rustc_builtin_macros/src/format_foreign.rs
@@ -289,8 +289,8 @@ fn translate(&self, s: &mut String) -> std::fmt::Result {
     }
 
     /// Returns an iterator over all substitutions in a given string.
-    pub fn iter_subs(s: &str, start_pos: usize) -> Substitutions<'_> {
-        Substitutions { s, pos: start_pos }
+    pub fn iter_subs(s: &str, _start_pos: usize) -> Substitutions<'_> {
+        Substitutions { s, pos: 0 }
     }
 
     /// Iterator over substitutions in a string.
@@ -303,15 +303,16 @@ impl<'a> Iterator for Substitutions<'a> {
         type Item = Substitution<'a>;
         fn next(&mut self) -> Option<Self::Item> {
             let (mut sub, tail) = parse_next_substitution(self.s)?;
+            let pos_diff = self.s.len() - tail.len();
             self.s = tail;
             match sub {
                 Substitution::Format(_) => {
                     if let Some(inner_span) = sub.position() {
                         sub.set_position(inner_span.start + self.pos, inner_span.end + self.pos);
-                        self.pos += inner_span.end;
+                        self.pos += pos_diff;
                     }
                 }
-                Substitution::Escape => self.pos += 2,
+                Substitution::Escape => self.pos += pos_diff,
             }
             Some(sub)
         }

When parse_next_substitution() returns here:

return Some((Substitution::Escape, &s[start + 2..]));

it seems like we should increment self.pos by start + 2, but we actually only increment it by 2:
Substitution::Escape => self.pos += 2,

@inquisitivecrystal yes, I get the exact same error. Playground. The error disappears if you add a new line after the á, so maybe your text editor did that automatically.

@inquisitivecrystal yes, I get the exact same error. Playground. The error disappears if you add a new line after the á, so maybe your text editor did that automatically.

That was it, thanks.

I'm going to unassign myself from this. I do hope my exploration ends up being helpful to whoever fixes it though. @dwrensha: if you want to fix this yourself, you certainly seem further along than I was, though you shouldn't feel any pressure to do so if you don't want to.

I put up a PR for a fix: #92460

@Badel2 I opened a new issue for your bug, because it's not exactly the same as this one.
#92462