rust-lang/rust

compiler panic: "byte index 10 is not a char boundary"

Closed this issue · 15 comments

$ rustc -Vv
rustc 1.36.0 (a53f9df32 2019-07-03)
binary: rustc
commit-hash: a53f9df32fbb0b5f4382caaad8f1a46f36ea887c
commit-date: 2019-07-03
host: x86_64-apple-darwin
release: 1.36.0
LLVM version: 8.0

$ echo Zm4gbWFpbigo2Lw= | base64 -D > main.rs

$ rustc main.rs
error: this file contains an un-closed delimiter
 --> main.rs:1:11
  |
1 | fn main((ؼ
  |        -- ^
  |        ||
  |        |un-closed delimiter
  |        un-closed delimiter

thread 'rustc' panicked at 'byte index 10 is not a char boundary; it is inside 'ؼ' (bytes 9..11) of `fn main((ؼ`', src/libcore/str/mod.rs:2034:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: aborting due to previous error


error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.36.0 (a53f9df32 2019-07-03) running on x86_64-apple-darwin

Found with the help of libfuzzer-sys.

ExpHP commented
  12: core::str::traits::<impl core::slice::SliceIndex<str> for core::ops::range::RangeFrom<usize>>::index::{{closure}}
  13: syntax::source_map::SourceMap::find_width_of_character_at_span
  14: syntax::source_map::SourceMap::next_point
  15: syntax::parse::diagnostics::<impl syntax::parse::parser::Parser>::unexpected_try_recover
  16: syntax::parse::parser::Parser::parse_fn_args::{{closure}}
  17: syntax::parse::parser::Parser::parse_fn_args
  18: syntax::parse::parser::Parser::parse_fn_decl
  19: syntax::parse::parser::Parser::parse_item_fn
  20: syntax::parse::parser::Parser::parse_item_implementation
  21: syntax::parse::parser::Parser::parse_item_

There's two slicing operations in find_width_of_character_at_span. I'm trying to figure out how a non-boundary index could be appearing in there... maybe unexpected_try_recover is calling next_point with a span that begins at index 10? (where does this come from? Parser.prev_span?) Building a debug compiler to check those log statements...

Edit: Building the debug compiler was a bust. Embarassingly, things seem to have changed and I don't know how to get those debug macros to fire in the current compiler.

@ExpHP do you mean you were using RUST_LOG and not seeing debug output?

If so, that is because the environment variable name under rustc was changed to RUSTC_LOG. (I make this mistake pretty much every day due to muscle memory.)


As an example, here is the tail of my log output:

% RUSTC_LOG=syntax::parse,syntax::source_map ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc  /tmp/off_index.rs
DEBUG 2019-07-11T12:58:22Z: syntax::parse::attr: parse_outer_attributes: self.token=Token { kind: OpenDelim(Paren), spa\
n: Span { lo: BytePos(8), hi: BytePos(9), ctxt: #0 } }
DEBUG 2019-07-11T12:58:22Z: syntax::parse::parser: parse_arg_general parse_pat (is_name_required:true)
DEBUG 2019-07-11T12:58:22Z: syntax::source_map: find_width_of_character_at_span: local_begin=`SourceFileAndBytePos { sf\
: SourceFile(/tmp/off_index.rs), pos: BytePos(10) }`, local_end=`SourceFileAndBytePos { sf: SourceFile(/tmp/off_index.r\
s), pos: BytePos(11) }`
DEBUG 2019-07-11T12:58:22Z: syntax::source_map: find_width_of_character_at_span: start_index=`10`, end_index=`11`
DEBUG 2019-07-11T12:58:22Z: syntax::source_map: find_width_of_character_at_span: source_len=`11`
thread 'rustc' panicked at 'byte index 10 is not a char boundary; it is inside 'ؼ' (bytes 9..11) of `fn main((ؼ`', src/\
libcore/str/mod.rs:2039:5

triage: P-medium. Removing nomination.

This was broken between 1.15 and 1.16, but those are so old and the nature of the ICE itself has changed from 1.28 to 1.29, so I do not think bisection would be worthwhile.

ExpHP commented
Offtopic

@pnkfelix : Thanks. Unfortunately, I did guess the name of RUSTC_LOG after RUST_LOG didn't work, and all it showed me was this:

$ RUSTC_LOG=trace rustc +stage1 src/main.rs
 INFO 2019-07-11T13:33:39Z: jobserver::imp: created a jobserver: Client { read: File { fd: 3, path: "pipe:[2538076]", read: true, write: false }, write: File { fd: 4, path: "pipe:[2538076]", read: false, write: true } }
 INFO 2019-07-11T13:33:39Z: rustc_interface::util: codegen backend candidate: /home/lampam/asd/clone/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends
 INFO 2019-07-11T13:33:39Z: rustc_interface::util: probing /home/lampam/asd/clone/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends for a codegen backend
error: this file contains an un-closed delimiter
 --> src/main.rs:1:11
  |
1 | fn main((ؼ
  |        -- ^
  |        ||
  |        |un-closed delimiter
  |        un-closed delimiter

thread 'rustc' panicked at 'byte index 10 is not a char boundary; it is inside 'ؼ' (bytes 9..11) of `fn main((ؼ`', src/libcore/str/mod.rs:2039:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: aborting due to previous error

I set debug = true in my config.toml before building, and didn't see anything else in there. (I recall that debug! and trace! used to be explicitly mentioned in the comment for debug-assertions but it is no longer, so I did not enable it.)

ExpHP commented

Okay, I did the following evil, evil thing:

Pure evil, do not open
diff --git a/src/libsyntax_pos/span_encoding.rs b/src/libsyntax_pos/span_encoding.rs
index 525ec13623..d7da2206ab 100644
--- a/src/libsyntax_pos/span_encoding.rs
+++ b/src/libsyntax_pos/span_encoding.rs
@@ -74,6 +74,12 @@ pub const DUMMY_SP: Span = Span { base_or_index: 0, len_or_tag: 0, ctxt_or_zero:
 impl Span {
     #[inline]
     pub fn new(mut lo: BytePos, mut hi: BytePos, ctxt: SyntaxContext) -> Self {
+        if lo == BytePos(10) || hi == BytePos(10) {
+            eprintln!("=========================");
+            eprintln!("{:?}", (lo, hi));
+            eprintln!();
+            let _ = std::panic::catch_unwind(|| panic!()); // potentially print backtrace
+        }
         if lo > hi {
             std::mem::swap(&mut lo, &mut hi);
         }

And acquired the following backtrace of when the span was first constructed:

=========================
(BytePos(10), BytePos(11))

  15: syntax_pos::span_encoding::Span::new at /home/lampam/asd/clone/rust/src/libsyntax_pos/span_encoding.rs:81
  16: syntax_pos::SpanData::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:218
  17: syntax_pos::<impl syntax_pos::span_encoding::Span>::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:268
  18: syntax::tokenstream::TokenTree::close_tt at src/libsyntax/tokenstream.rs:141
  19: syntax::parse::parser::TokenCursor::next at src/libsyntax/parse/parser.rs:315
  20: syntax::parse::parser::Parser::next_tok at src/libsyntax/parse/parser.rs:528
  21: syntax::parse::parser::Parser::bump at src/libsyntax/parse/parser.rs:1031
  22: syntax::parse::parser::Parser::parse_ident_common at src/libsyntax/parse/parser.rs:632
  23: syntax::parse::parser::Parser::parse_ident at src/libsyntax/parse/parser.rs:617
  24: syntax::parse::parser::Parser::parse_pat_ident at src/libsyntax/parse/parser.rs:4132
  25: syntax::parse::parser::Parser::parse_pat_with_range_pat at src/libsyntax/parse/parser.rs:3987
  26: syntax::parse::parser::Parser::parse_pat at src/libsyntax/parse/parser.rs:3908
  27: syntax::parse::parser::Parser::parse_pat_list at src/libsyntax/parse/parser.rs:3582
  28: syntax::parse::parser::Parser::parse_parenthesized_pat_list at src/libsyntax/parse/parser.rs:3547
  29: syntax::parse::parser::Parser::parse_pat_with_range_pat at src/libsyntax/parse/parser.rs:3938
  30: syntax::parse::parser::Parser::parse_pat at src/libsyntax/parse/parser.rs:3908
  31: syntax::parse::parser::Parser::parse_arg_general at src/libsyntax/parse/parser.rs:1510
  32: syntax::parse::parser::Parser::parse_fn_args::{{closure}} at src/libsyntax/parse/parser.rs:5380
  33: syntax::parse::parser::Parser::parse_seq_to_before_tokens at src/libsyntax/parse/parser.rs:983
  34: syntax::parse::parser::Parser::parse_seq_to_before_end at src/libsyntax/parse/parser.rs:916
  35: syntax::parse::parser::Parser::parse_fn_args at src/libsyntax/parse/parser.rs:5368
  36: syntax::parse::parser::Parser::parse_fn_decl at src/libsyntax/parse/parser.rs:5429
  37: syntax::parse::parser::Parser::parse_item_fn at src/libsyntax/parse/parser.rs:5660

It also gets constructed a second time:

Second backtrace
=========================
(BytePos(10), BytePos(11))

  15: syntax_pos::span_encoding::Span::new at /home/lampam/asd/clone/rust/src/libsyntax_pos/span_encoding.rs:81
  16: syntax_pos::SpanData::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:218
  17: syntax_pos::<impl syntax_pos::span_encoding::Span>::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:268
  18: syntax::tokenstream::TokenTree::close_tt at src/libsyntax/tokenstream.rs:141
  19: syntax::parse::parser::TokenCursor::next at src/libsyntax/parse/parser.rs:315
  20: syntax::parse::parser::Parser::next_tok at src/libsyntax/parse/parser.rs:528
  21: syntax::parse::parser::Parser::bump at src/libsyntax/parse/parser.rs:1031
  22: syntax::parse::parser::Parser::expect_one_of at src/libsyntax/parse/parser.rs:590
  23: syntax::parse::parser::Parser::expect at src/libsyntax/parse/parser.rs:577
  24: syntax::parse::parser::Parser::parse_parenthesized_pat_list at src/libsyntax/parse/parser.rs:3555
  25: syntax::parse::parser::Parser::parse_pat_with_range_pat at src/libsyntax/parse/parser.rs:3938
  26: syntax::parse::parser::Parser::parse_pat at src/libsyntax/parse/parser.rs:3908
  27: syntax::parse::parser::Parser::parse_arg_general at src/libsyntax/parse/parser.rs:1510
  28: syntax::parse::parser::Parser::parse_fn_args::{{closure}} at src/libsyntax/parse/parser.rs:5380
  29: syntax::parse::parser::Parser::parse_seq_to_before_tokens at src/libsyntax/parse/parser.rs:983
  30: syntax::parse::parser::Parser::parse_seq_to_before_end at src/libsyntax/parse/parser.rs:916
  31: syntax::parse::parser::Parser::parse_fn_args at src/libsyntax/parse/parser.rs:5368
  32: syntax::parse::parser::Parser::parse_fn_decl at src/libsyntax/parse/parser.rs:5429
  33: syntax::parse::parser::Parser::parse_item_fn at src/libsyntax/parse/parser.rs:5660

(the region of the backtrace that differs between the two is items 22-29 in the first one, or 22-25 in the second)


Some interesting spots from here:

This is clearly where the value of 10 is created:

/// Returns the closing delimiter as a token tree.
pub fn close_tt(span: Span, delim: DelimToken) -> TokenTree {
let close_span = if span.is_dummy() {
span
} else {
span.with_lo(span.hi() - BytePos(delim.len() as u32))
};
TokenTree::token(token::CloseDelim(delim), close_span)
}

TokenCursor::next calls this if there are no remaining tokens and it still hasn't seen the closing delimiter (it seems to have as an unstated precondition that the closing delimiter must exist).

Here's the innermost bit that's specific to ident parsing. Doesn't look that odd...

token::Ident(name, _) => {
if self.token.is_reserved_ident() {
let mut err = self.expected_ident_found();
if recover {
err.emit();
} else {
return Err(err);
}
}
let span = self.token.span;
self.bump();
Ok(Ident::new(name, span))
}

@ExpHP I’m not familiar with RUSTC_LOG=trace

I tend to list (comma separated) module paths in my own use of RUSTC_LOG; did you try that? You can see an example in the transcript I gave in my comment above

ExpHP commented

Oh. (I'm so used to $ prompts I didn't notice you included the command!). Yeah, that was the issue.

Same ICE occurs with:

y![
Ϥ, 

Which returns:

error: this file contains an un-closed delimiter
 --> crash-298117e3012a17b3e85cddad606b2697232cba40:2:3
  |
1 | y![
  |   - un-closed delimiter
2 | Ϥ,
  |   ^

error: macros that expand to items must be delimited with braces or followed by a semicolon
 --> crash-298117e3012a17b3e85cddad606b2697232cba40:1:3
  |
1 |   y![
  |  ___^
2 | | Ϥ,
  | |__^
thread 'rustc' panicked at 'byte index 1 is not a char boundary; it is inside 'Ϥ' (bytes 0..2) of `Ϥ,`', src/libcore/str/mod.rs:2039:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: aborting due to previous error


error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

@Chocol4te could you post a base64-encoded version of your crash-298117e3012a17b3e85cddad606b2697232cba40? When I copy/paste what you've posted above, I don't see a compiler crash, presumably because some non-ascii characters are getting lost.

@dwrensha My bad

eSFbCs+kLA==

Hm, the example from #62524 (comment) hasn't been fixed?

Seems so, on the current nightly:

error: this file contains an un-closed delimiter
 --> 62524-2.rs:2:3
  |
1 | y![
  |   - un-closed delimiter
2 | Ϥ,
  |   ^

error: macros that expand to items must be delimited with braces or followed by a semicolon
 --> 62524-2.rs:1:3
  |
1 |   y![
  |  ___^
2 | | Ϥ,
  | |__^
  |
thread 'rustc' panicked at 'byte index 1 is not a char boundary; it is inside 'Ϥ' (bytes 0..2) of `Ϥ,`', src\libcore\str\mod.rs:2069:5
stack backtrace:
   0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
   1: core::fmt::write
   2: <std::io::IoSliceMut as core::fmt::Debug>::fmt
   3: std::panicking::take_hook
   4: std::panicking::take_hook
   5: rustc_driver::report_ice
   6: std::panicking::rust_panic_with_hook
   7: std::panicking::begin_panic_fmt
   8: rust_begin_unwind
   9: core::panicking::panic_fmt
  10: core::str::slice_error_fail
  11: <rustc_driver::args::Error as core::fmt::Debug>::fmt
  12: <rustc_errors::lock::acquire_global_lock::Handle as core::ops::drop::Drop>::drop
  13: rustc_errors::annotate_snippet_emitter_writer::AnnotateSnippetEmitterWriter::ui_testing
  14: <rustc_errors::emitter::EmitterWriter as rustc_errors::emitter::Emitter>::emit_diagnostic
  15: rustc_errors::HandlerInner::emit_diagnostic
  16: rustc_errors::diagnostic_builder::DiagnosticBuilder::emit
  17: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_foreign_item
  18: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_item
  19: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_item
  20: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_item
  21: rustc_parse::parser::module::<impl rustc_parse::parser::Parser>::parse_crate_mod
  22: rustc_parse::parser::module::<impl rustc_parse::parser::Parser>::parse_crate_mod
  23: rustc_parse::parse_crate_from_file
  24: <rustc_interface::proc_macro_decls::Finder as rustc::hir::itemlikevisit::ItemLikeVisitor>::visit_item
  25: <rustc_interface::proc_macro_decls::Finder as rustc::hir::itemlikevisit::ItemLikeVisitor>::visit_item
  26: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::compile
  27: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::parse
  28: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
  29: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
  30: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
  31: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
  32: _rust_maybe_catch_panic
  33: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
  34: ZN244_$LT$std..error..$LT$impl$u20$core..convert..From$LT$alloc..string..String$GT$$u20$for$u20$alloc..boxed..Box$LT$dyn$u20$std..error..Error$u2b$core..marker..Send$u2b$core..marker..Sync$GT$$GT$..from..StringError$u20$as$u20$core..fmt..Display$GT$3fmt17
  35: std::sys::windows::thread::Thread::new
  36: BaseThreadInitThunk
  37: RtlUserThreadStart
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.40.0-nightly (4f03f4a98 2019-11-12) running on x86_64-pc-windows-msvc

query stack during panic:
end of query stack
error: aborting due to previous error

@Alexendoo I guess my fix #66264 would help. I'll get some time to test it out.

Confirmed that my fix fixed this issue, I also add this unit test in this PR: #66429