compiler panic: "byte index 10 is not a char boundary"
Closed this issue · 15 comments
$ rustc -Vv
rustc 1.36.0 (a53f9df32 2019-07-03)
binary: rustc
commit-hash: a53f9df32fbb0b5f4382caaad8f1a46f36ea887c
commit-date: 2019-07-03
host: x86_64-apple-darwin
release: 1.36.0
LLVM version: 8.0
$ echo Zm4gbWFpbigo2Lw= | base64 -D > main.rs
$ rustc main.rs
error: this file contains an un-closed delimiter
--> main.rs:1:11
|
1 | fn main((ؼ
| -- ^
| ||
| |un-closed delimiter
| un-closed delimiter
thread 'rustc' panicked at 'byte index 10 is not a char boundary; it is inside 'ؼ' (bytes 9..11) of `fn main((ؼ`', src/libcore/str/mod.rs:2034:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: aborting due to previous error
error: internal compiler error: unexpected panic
note: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
note: rustc 1.36.0 (a53f9df32 2019-07-03) running on x86_64-apple-darwin
Found with the help of libfuzzer-sys.
12: core::str::traits::<impl core::slice::SliceIndex<str> for core::ops::range::RangeFrom<usize>>::index::{{closure}}
13: syntax::source_map::SourceMap::find_width_of_character_at_span
14: syntax::source_map::SourceMap::next_point
15: syntax::parse::diagnostics::<impl syntax::parse::parser::Parser>::unexpected_try_recover
16: syntax::parse::parser::Parser::parse_fn_args::{{closure}}
17: syntax::parse::parser::Parser::parse_fn_args
18: syntax::parse::parser::Parser::parse_fn_decl
19: syntax::parse::parser::Parser::parse_item_fn
20: syntax::parse::parser::Parser::parse_item_implementation
21: syntax::parse::parser::Parser::parse_item_
There's two slicing operations in find_width_of_character_at_span
. I'm trying to figure out how a non-boundary index could be appearing in there... maybe unexpected_try_recover
is calling next_point
with a span that begins at index 10? (where does this come from? Parser.prev_span
?) Building a debug compiler to check those log statements...
Edit: Building the debug compiler was a bust. Embarassingly, things seem to have changed and I don't know how to get those debug macros to fire in the current compiler.
@ExpHP do you mean you were using RUST_LOG
and not seeing debug output?
If so, that is because the environment variable name under rustc
was changed to RUSTC_LOG
. (I make this mistake pretty much every day due to muscle memory.)
As an example, here is the tail of my log output:
% RUSTC_LOG=syntax::parse,syntax::source_map ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc /tmp/off_index.rs
DEBUG 2019-07-11T12:58:22Z: syntax::parse::attr: parse_outer_attributes: self.token=Token { kind: OpenDelim(Paren), spa\
n: Span { lo: BytePos(8), hi: BytePos(9), ctxt: #0 } }
DEBUG 2019-07-11T12:58:22Z: syntax::parse::parser: parse_arg_general parse_pat (is_name_required:true)
DEBUG 2019-07-11T12:58:22Z: syntax::source_map: find_width_of_character_at_span: local_begin=`SourceFileAndBytePos { sf\
: SourceFile(/tmp/off_index.rs), pos: BytePos(10) }`, local_end=`SourceFileAndBytePos { sf: SourceFile(/tmp/off_index.r\
s), pos: BytePos(11) }`
DEBUG 2019-07-11T12:58:22Z: syntax::source_map: find_width_of_character_at_span: start_index=`10`, end_index=`11`
DEBUG 2019-07-11T12:58:22Z: syntax::source_map: find_width_of_character_at_span: source_len=`11`
thread 'rustc' panicked at 'byte index 10 is not a char boundary; it is inside 'ؼ' (bytes 9..11) of `fn main((ؼ`', src/\
libcore/str/mod.rs:2039:5
triage: P-medium. Removing nomination.
This was broken between 1.15 and 1.16, but those are so old and the nature of the ICE itself has changed from 1.28 to 1.29, so I do not think bisection would be worthwhile.
Offtopic
@pnkfelix : Thanks. Unfortunately, I did guess the name of RUSTC_LOG
after RUST_LOG
didn't work, and all it showed me was this:
$ RUSTC_LOG=trace rustc +stage1 src/main.rs
INFO 2019-07-11T13:33:39Z: jobserver::imp: created a jobserver: Client { read: File { fd: 3, path: "pipe:[2538076]", read: true, write: false }, write: File { fd: 4, path: "pipe:[2538076]", read: false, write: true } }
INFO 2019-07-11T13:33:39Z: rustc_interface::util: codegen backend candidate: /home/lampam/asd/clone/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends
INFO 2019-07-11T13:33:39Z: rustc_interface::util: probing /home/lampam/asd/clone/rust/build/x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends for a codegen backend
error: this file contains an un-closed delimiter
--> src/main.rs:1:11
|
1 | fn main((ؼ
| -- ^
| ||
| |un-closed delimiter
| un-closed delimiter
thread 'rustc' panicked at 'byte index 10 is not a char boundary; it is inside 'ؼ' (bytes 9..11) of `fn main((ؼ`', src/libcore/str/mod.rs:2039:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: aborting due to previous error
I set debug = true
in my config.toml
before building, and didn't see anything else in there. (I recall that debug!
and trace!
used to be explicitly mentioned in the comment for debug-assertions
but it is no longer, so I did not enable it.)
Okay, I did the following evil, evil thing:
Pure evil, do not open
diff --git a/src/libsyntax_pos/span_encoding.rs b/src/libsyntax_pos/span_encoding.rs
index 525ec13623..d7da2206ab 100644
--- a/src/libsyntax_pos/span_encoding.rs
+++ b/src/libsyntax_pos/span_encoding.rs
@@ -74,6 +74,12 @@ pub const DUMMY_SP: Span = Span { base_or_index: 0, len_or_tag: 0, ctxt_or_zero:
impl Span {
#[inline]
pub fn new(mut lo: BytePos, mut hi: BytePos, ctxt: SyntaxContext) -> Self {
+ if lo == BytePos(10) || hi == BytePos(10) {
+ eprintln!("=========================");
+ eprintln!("{:?}", (lo, hi));
+ eprintln!();
+ let _ = std::panic::catch_unwind(|| panic!()); // potentially print backtrace
+ }
if lo > hi {
std::mem::swap(&mut lo, &mut hi);
}
And acquired the following backtrace of when the span was first constructed:
=========================
(BytePos(10), BytePos(11))
15: syntax_pos::span_encoding::Span::new at /home/lampam/asd/clone/rust/src/libsyntax_pos/span_encoding.rs:81
16: syntax_pos::SpanData::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:218
17: syntax_pos::<impl syntax_pos::span_encoding::Span>::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:268
18: syntax::tokenstream::TokenTree::close_tt at src/libsyntax/tokenstream.rs:141
19: syntax::parse::parser::TokenCursor::next at src/libsyntax/parse/parser.rs:315
20: syntax::parse::parser::Parser::next_tok at src/libsyntax/parse/parser.rs:528
21: syntax::parse::parser::Parser::bump at src/libsyntax/parse/parser.rs:1031
22: syntax::parse::parser::Parser::parse_ident_common at src/libsyntax/parse/parser.rs:632
23: syntax::parse::parser::Parser::parse_ident at src/libsyntax/parse/parser.rs:617
24: syntax::parse::parser::Parser::parse_pat_ident at src/libsyntax/parse/parser.rs:4132
25: syntax::parse::parser::Parser::parse_pat_with_range_pat at src/libsyntax/parse/parser.rs:3987
26: syntax::parse::parser::Parser::parse_pat at src/libsyntax/parse/parser.rs:3908
27: syntax::parse::parser::Parser::parse_pat_list at src/libsyntax/parse/parser.rs:3582
28: syntax::parse::parser::Parser::parse_parenthesized_pat_list at src/libsyntax/parse/parser.rs:3547
29: syntax::parse::parser::Parser::parse_pat_with_range_pat at src/libsyntax/parse/parser.rs:3938
30: syntax::parse::parser::Parser::parse_pat at src/libsyntax/parse/parser.rs:3908
31: syntax::parse::parser::Parser::parse_arg_general at src/libsyntax/parse/parser.rs:1510
32: syntax::parse::parser::Parser::parse_fn_args::{{closure}} at src/libsyntax/parse/parser.rs:5380
33: syntax::parse::parser::Parser::parse_seq_to_before_tokens at src/libsyntax/parse/parser.rs:983
34: syntax::parse::parser::Parser::parse_seq_to_before_end at src/libsyntax/parse/parser.rs:916
35: syntax::parse::parser::Parser::parse_fn_args at src/libsyntax/parse/parser.rs:5368
36: syntax::parse::parser::Parser::parse_fn_decl at src/libsyntax/parse/parser.rs:5429
37: syntax::parse::parser::Parser::parse_item_fn at src/libsyntax/parse/parser.rs:5660
It also gets constructed a second time:
Second backtrace
=========================
(BytePos(10), BytePos(11))
15: syntax_pos::span_encoding::Span::new at /home/lampam/asd/clone/rust/src/libsyntax_pos/span_encoding.rs:81
16: syntax_pos::SpanData::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:218
17: syntax_pos::<impl syntax_pos::span_encoding::Span>::with_lo at /home/lampam/asd/clone/rust/src/libsyntax_pos/lib.rs:268
18: syntax::tokenstream::TokenTree::close_tt at src/libsyntax/tokenstream.rs:141
19: syntax::parse::parser::TokenCursor::next at src/libsyntax/parse/parser.rs:315
20: syntax::parse::parser::Parser::next_tok at src/libsyntax/parse/parser.rs:528
21: syntax::parse::parser::Parser::bump at src/libsyntax/parse/parser.rs:1031
22: syntax::parse::parser::Parser::expect_one_of at src/libsyntax/parse/parser.rs:590
23: syntax::parse::parser::Parser::expect at src/libsyntax/parse/parser.rs:577
24: syntax::parse::parser::Parser::parse_parenthesized_pat_list at src/libsyntax/parse/parser.rs:3555
25: syntax::parse::parser::Parser::parse_pat_with_range_pat at src/libsyntax/parse/parser.rs:3938
26: syntax::parse::parser::Parser::parse_pat at src/libsyntax/parse/parser.rs:3908
27: syntax::parse::parser::Parser::parse_arg_general at src/libsyntax/parse/parser.rs:1510
28: syntax::parse::parser::Parser::parse_fn_args::{{closure}} at src/libsyntax/parse/parser.rs:5380
29: syntax::parse::parser::Parser::parse_seq_to_before_tokens at src/libsyntax/parse/parser.rs:983
30: syntax::parse::parser::Parser::parse_seq_to_before_end at src/libsyntax/parse/parser.rs:916
31: syntax::parse::parser::Parser::parse_fn_args at src/libsyntax/parse/parser.rs:5368
32: syntax::parse::parser::Parser::parse_fn_decl at src/libsyntax/parse/parser.rs:5429
33: syntax::parse::parser::Parser::parse_item_fn at src/libsyntax/parse/parser.rs:5660
(the region of the backtrace that differs between the two is items 22-29 in the first one, or 22-25 in the second)
Some interesting spots from here:
This is clearly where the value of 10 is created:
rust/src/libsyntax/tokenstream.rs
Lines 136 to 144 in 4bb6b4a
TokenCursor::next
calls this if there are no remaining tokens and it still hasn't seen the closing delimiter (it seems to have as an unstated precondition that the closing delimiter must exist).
Here's the innermost bit that's specific to ident
parsing. Doesn't look that odd...
rust/src/libsyntax/parse/parser.rs
Lines 622 to 634 in 4bb6b4a
@ExpHP I’m not familiar with RUSTC_LOG=trace
I tend to list (comma separated) module paths in my own use of RUSTC_LOG; did you try that? You can see an example in the transcript I gave in my comment above
Oh. (I'm so used to $
prompts I didn't notice you included the command!). Yeah, that was the issue.
Same ICE occurs with:
y![
Ϥ,
Which returns:
error: this file contains an un-closed delimiter
--> crash-298117e3012a17b3e85cddad606b2697232cba40:2:3
|
1 | y![
| - un-closed delimiter
2 | Ϥ,
| ^
error: macros that expand to items must be delimited with braces or followed by a semicolon
--> crash-298117e3012a17b3e85cddad606b2697232cba40:1:3
|
1 | y![
| ___^
2 | | Ϥ,
| |__^
thread 'rustc' panicked at 'byte index 1 is not a char boundary; it is inside 'Ϥ' (bytes 0..2) of `Ϥ,`', src/libcore/str/mod.rs:2039:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: aborting due to previous error
error: internal compiler error: unexpected panic
note: the compiler unexpectedly panicked. this is a bug.
@Chocol4te could you post a base64-encoded version of your crash-298117e3012a17b3e85cddad606b2697232cba40
? When I copy/paste what you've posted above, I don't see a compiler crash, presumably because some non-ascii characters are getting lost.
Hm, the example from #62524 (comment) hasn't been fixed?
Seems so, on the current nightly:
error: this file contains an un-closed delimiter
--> 62524-2.rs:2:3
|
1 | y![
| - un-closed delimiter
2 | Ϥ,
| ^
error: macros that expand to items must be delimited with braces or followed by a semicolon
--> 62524-2.rs:1:3
|
1 | y![
| ___^
2 | | Ϥ,
| |__^
|
thread 'rustc' panicked at 'byte index 1 is not a char boundary; it is inside 'Ϥ' (bytes 0..2) of `Ϥ,`', src\libcore\str\mod.rs:2069:5
stack backtrace:
0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
1: core::fmt::write
2: <std::io::IoSliceMut as core::fmt::Debug>::fmt
3: std::panicking::take_hook
4: std::panicking::take_hook
5: rustc_driver::report_ice
6: std::panicking::rust_panic_with_hook
7: std::panicking::begin_panic_fmt
8: rust_begin_unwind
9: core::panicking::panic_fmt
10: core::str::slice_error_fail
11: <rustc_driver::args::Error as core::fmt::Debug>::fmt
12: <rustc_errors::lock::acquire_global_lock::Handle as core::ops::drop::Drop>::drop
13: rustc_errors::annotate_snippet_emitter_writer::AnnotateSnippetEmitterWriter::ui_testing
14: <rustc_errors::emitter::EmitterWriter as rustc_errors::emitter::Emitter>::emit_diagnostic
15: rustc_errors::HandlerInner::emit_diagnostic
16: rustc_errors::diagnostic_builder::DiagnosticBuilder::emit
17: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_foreign_item
18: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_item
19: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_item
20: rustc_parse::parser::item::<impl rustc_parse::parser::Parser>::parse_item
21: rustc_parse::parser::module::<impl rustc_parse::parser::Parser>::parse_crate_mod
22: rustc_parse::parser::module::<impl rustc_parse::parser::Parser>::parse_crate_mod
23: rustc_parse::parse_crate_from_file
24: <rustc_interface::proc_macro_decls::Finder as rustc::hir::itemlikevisit::ItemLikeVisitor>::visit_item
25: <rustc_interface::proc_macro_decls::Finder as rustc::hir::itemlikevisit::ItemLikeVisitor>::visit_item
26: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::compile
27: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::parse
28: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
29: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
30: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
31: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
32: _rust_maybe_catch_panic
33: <syntax_pos::symbol::SymbolStr as core::fmt::Display>::fmt
34: ZN244_$LT$std..error..$LT$impl$u20$core..convert..From$LT$alloc..string..String$GT$$u20$for$u20$alloc..boxed..Box$LT$dyn$u20$std..error..Error$u2b$core..marker..Send$u2b$core..marker..Sync$GT$$GT$..from..StringError$u20$as$u20$core..fmt..Display$GT$3fmt17
35: std::sys::windows::thread::Thread::new
36: BaseThreadInitThunk
37: RtlUserThreadStart
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
error: internal compiler error: unexpected panic
note: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
note: rustc 1.40.0-nightly (4f03f4a98 2019-11-12) running on x86_64-pc-windows-msvc
query stack during panic:
end of query stack
error: aborting due to previous error
@Alexendoo I guess my fix #66264 would help. I'll get some time to test it out.