ocaml-community/utop

Invalid_argument("1E5965 is not an Unicode scalar value")

reynir opened this issue · 6 comments

Since utop 2.10 I am no longer able to use ppx_blob with binary files:

─( 13:00:36 )─< command 0 >────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # "\x1E\x59\x65";;
- : string = "\030Ye"
─( 13:00:37 )─< command 1 >────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # #require "ppx_blob";;
─( 13:00:49 )─< command 2 >────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # let s = [%blob "/home/reynir/bin/bob.com"];;
Fatal error: exception Invalid_argument("1E5965 is not an Unicode scalar value")

The file bob.com is fetched from here: https://github.com/dinosaure/bob/actions/runs/3250831340

57839aa3033139ec4a66c23b3f6e4ee14f64dfe270dc1554524013ed1d599ba2  /home/reynir/bin/bob.com

I'm not sure how to make the test case smaller.

With utop-full I could get a more useful backtrace:

Fatal error: exception Invalid_argument("1E5965 is not an Unicode scalar value")
Raised at Stdlib.invalid_arg in file "stdlib.ml", line 30, characters 20-45
Called from Zed_utf8.unsafe_extract_prev in file "src/zed_utf8.ml", line 229, characters 33-189
Called from Zed_string.Zed_string0.prev_ofs in file "src/zed_string.ml", line 139, characters 21-50
Called from Zed_string.Zed_string0.extract_prev in file "src/zed_string.ml", line 210, characters 14-30
Called from Zed_string.Zed_string0.unsafe_explode.aux in file "src/zed_string.ml", line 294, characters 23-43
Called from LTerm_text_impl.Make.of_string in file "src/lTerm_text_impl.ml", line 23, characters 65-96
Called from UTop_main.render_out_phrase in file "src/lib/uTop_main.ml", line 348, characters 17-42
Called from UTop_main.loop in file "src/lib/uTop_main.ml", line 865, characters 30-61
Re-raised at Location.report_exception.loop in file "parsing/location.ml", line 938, characters 14-25
Called from UTop.get_message in file "src/lib/uTop.ml", line 129, characters 2-11
Called from UTop_main.loop in file "src/lib/uTop_main.ml", line 871, characters 21-61
Called from UTop_main.main_aux in file "src/lib/uTop_main.ml", line 1630, characters 8-17
Called from UTop_main.main_internal in file "src/lib/uTop_main.ml", line 1646, characters 4-25

I managed to minimize it to this short snippet:

─( 13:32:56 )─< command 2 >────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # "\247\165\165\165";;
Fatal error: exception Invalid_argument("1E5965 is not an Unicode scalar value")

Thanks for the short repro. I'd say it's an issue with zed: next_error returns no error, but it only checks that the bytes that encode length are "valid", not that the resulting value fits in Uchar. I'll try to repro that there.

Fixed in ocaml-community/zed#50

I guess we can't print anything that makes sense but at least utop does not crash. Can you try the PR with the full [%blob] output? Thanks

utop # "\247\165\165\165";;
- : string = "÷¥¥¥"

Thanks, that fixed it for me

─( 15:30:43 )─< command 2 >────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # let s = [%blob "/home/reynir/bin/bob.com"];;
val s : string =
  "MZqFpD='\n\000\000\016\000ø\000\000\000\000\000\000\000\001\000\b@\000\000\000\000\000\000\000\000\000\000\000JT\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\011\000\000²@ë\000ë\020ì\b\000\000ë\005éh#\000\000ü\015\031>à¿\000p1ɎÁú׉Ìû\014\031è\000\000^îr\000¸\000\002PP\0071ÿ¹\000\002ó¤\015\031Òÿê \000\000ٹ\000\027¸P\000À1À1ÿóªú@t\019è\021\000\007°\0011É0ö¿p\003èg\000Ouúêì&\000\000SR´\022Í\019s\0251ÀÍ\019rE¸\001\002¹\001\000\000»\000\002Ã1ÛÍ\019r2´\bÍ\019r,πç?áÀÐÁÐÁÍ\030\006\0311öƾ\016\021÷¥¥¥¥¥¤\031«««Xª[ÃZò1ÀÍ\019r÷ë¢PQÍÐÉÐÉ\bÁ1۰\001´\002Í"... (* string length 9936896; truncated *)

I will reopen until ocaml-community/zed#50 is merged.