mit-plv/fiat-crypto

Github CI Anomaly Zoo

andres-erbsen opened this issue · 40 comments

To shed a bit more light on this

If you go to the PR at mit-plv/bedrock2#280, we see
image
and the actions log shows
image
so it looks like the overall status of the commit includes runs of that commit across all branches, but the displayed checks are only for master or something?

I'm thinking this might actually be an OOM kill, especially given that we're running the checker while replacing vm_compute with the lazy reduction machine.

Plausibly related: actions/runner-images#736 (comment) (but this is only about mac)

Maybe worth asking about at https://github.com/orgs/community/discussions/

https://github.com/mit-plv/fiat-crypto/actions/runs/3245312644/jobs/5322679386

Cloning into 'boringssl'...
  error: 44224 bytes of body are still expected
  fetch-pack: unexpected disconnect while reading sideband packet
  fatal: early EOF
  fatal: fetch-pack: invalid index-pack output
  + exit 128

https://github.com/mit-plv/fiat-crypto/actions/runs/3245312653/jobs/5322679307

This is new

# ocamlfind: Package `findlib' not found
# Error while running 'D:\a\fiat-crypto\fiat-crypto\_opam\bin/ocamlfind.exe query findlib -format %v' (exit code 2)
# Configuration script failed!

I've seen a bunch of errors about dynlink failing, but not this one

We should include snippets of the logs in the comments; GH deletes logs after a couple months

https://github.com/mit-plv/fiat-crypto/actions/runs/3242704485/jobs/5316358457

COQFLAGS="-Q src/bedrock2 bedrock2 -Q src/bedrock2Examples bedrock2Examples -Q /github/workspace/rupicola/bedrock2/deps/coqutil/src/coqutil coqutil " ../etc/bytedump.sh bedrock2.PrintListByte.allBytes > special/BytedumpTest.out.tmp
make[2]: *** [Makefile:64: special/BytedumpTest.out] Error 1
make[1]: *** [Makefile:105: bedrock2_ex] Error 2

How do we debug this? Maybe we shouldn't suppress stderr in bytedump?

Command exited with non-zero status 2
/Users/runner/work/fiat-crypto/fiat-crypto/rupicola/bedrock2/deps/coqutil/src/coqutil/Tactics/syntactic_unify.vo (real: 0.35, user: 0.09, sys: 0.09, mem: 57556 ko)
make[3]: *** [/Users/runner/work/fiat-crypto/fiat-crypto/rupicola/bedrock2/deps/coqutil/src/coqutil/Tactics/syntactic_unify.vo] Error 2

This suspiciously started happening when I started not trying to rebuild on Mac OS

Can anyone reproduce the (seemingly consistent?) issue that make -j2 fails on Mac OS the first time it's run after a clean checkout? c.f. #1458

# Usage: coqworker.opt --kind=[proof|query|tactic] $args
# got -main-channel
# File ".\src/Curves/Edwards/XYZT/Basic.v", line 124, characters 6-336:
# Error:
# Anomaly
# "Uncaught exception Failure("The spawned process did not connect back in 10.0s")."
# Please report at [http://coq.inria.fr/bugs/.](http://coq.inria.fr/bugs/)
# 
# make: *** [Makefile.coq:793: src/Curves/Edwards/XYZT/Basic.vo] Error 129

https://github.com/mit-plv/fiat-crypto/actions/runs/3281473086/jobs/5403532417#step:17:1812

Sometimes the version of Coq changes between one step and the next in the docker action
coq-community/docker-coq-action#80
https://github.com/mit-plv/fiat-crypto/actions/runs/3287380197/jobs/5416490558#step:17:501

This results in a bytedump failure because the bedrock2 files are compiled with an outdated version of Coq and so we get inconsistent assumptions

gcc: fatal error: no input files https://github.com/mit-plv/fiat-crypto/actions/runs/3420869283/jobs/5696244304#step:17:453

That's probably just me messing up how to get the version of gcc; I guess it's not gcc -v? The line to be fixed is

The actual failure here is

COQFLAGS="-Q src/bedrock2 bedrock2 -Q src/bedrock2Examples bedrock2Examples -Q /github/workspace/rupicola/bedrock2/deps/coqutil/src/coqutil coqutil " ../etc/bytedump.py bedrock2.PrintListByte.allBytes > special/BytedumpTest.out.tmp

Coq < Coq < Toplevel input, characters 1-55:
> Require bedrock2.PrintListByte bedrock2.PrintListByte.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error:
Compiled library bedrock2.PrintListByte (in file /github/workspace/rupicola/bedrock2/bedrock2/src/bedrock2/PrintListByte.vo) makes inconsistent assumptions over library Coq.Init.Ltac

Coq < 
Coq < 
Unnamed_thm < 
Unnamed_thm < 
Unnamed_thm < Toplevel input, characters 32-63:
>   PrintListByte.print_list_byte bedrock2.PrintListByte.allBytes.
>                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error: The reference bedrock2.PrintListByte.allBytes was not found in the current environment.

Unnamed_thm < 
Coq < 
Error: make[2]: *** [Makefile:64: special/BytedumpTest.out] Error 2
Error: make[1]: *** [Makefile:105: bedrock2_ex] Error 2
make: *** No rule to make target 'bedrock2', needed by '.Makefile.coq.d'.  Stop.

which presumably is #1394 (comment)

https://github.com/mit-plv/fiat-crypto/actions/runs/3446654504/jobs/5751826046 Windows: Retrieve the Cygwin cache
Error: getaddrinfo ENOTFOUND www.cygwin.com

https://github.com/mit-plv/fiat-crypto/actions/runs/3457159795/jobs/5770405148#step:6:116

** Fatal error: Cannot run cygpath -m "libshell32" "libshell32.lib" "libshell32.dll.a" "libshell32.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32.lib" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32.dll.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\libshell32.a" "lib\libshell32" "lib\libshell32.lib" "lib\libshell32.dll.a" "lib\libshell32.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32.lib" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32.dll.a" "D:/a/fiat-crypto/fiat-crypto/_opam/lib/ocaml\flexdll\libshell32.a" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32.lib" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32.dll.a" "/usr/lib/gcc/x86_64-w64-mingw32/11/libshell32.a" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32.lib" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32.dll.a" "/usr/x86_64-w64-mingw32/lib/x86_64-w64-mingw32/11/libshell32.a" "/usr/x86_64-w64-mingw32/lib/libshell32" "/usr/x86_64-w64-mingw32/lib/libshell32.lib" "/usr/x86_64-w64-mingw32/lib/libshell32.dll.a" "/usr/x86_64-w64-mingw32/lib/libshell32.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32.lib" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32.dll.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/x86_64-w64-mingw32/11/libshell32.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32.lib" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32.dll.a" "/usr/x86_64-w64-mingw32/sys-root/mingw/lib/libshell32.a"
# File "caml_startup", line 1:
# Error: Error during linking (exit code 2)
# make[1]: *** [Makefile.common:184: _build/default/coq-core.install] Error 1
# make[1]: *** Waiting for unfinished jobs....
# make[1]: Leaving directory '/cygdrive/d/a/fiat-crypto/fiat-crypto/_opam/.opam-switch/build/coq.8.15.2'
# make: *** [Makefile.make:122: submake] Error 2
   2 |  opam exec -- cat ~/.opam/log/*
     |  ~~~~
     | The term 'opam' is not recognized as a name of a cmdlet, function, script file, or executable program. Check the
     | spelling of the name, or if a path was included, verify that the path is correct and try again.

https://github.com/mit-plv/fiat-crypto/actions/runs/3470364042/jobs/5798475008

   2 |  opam exec -- cat ~/.opam/log/*
     |  ~~~~
     | The term 'opam' is not recognized as a name of a cmdlet, function, script file, or executable program. Check the
     | spelling of the name, or if a path was included, verify that the path is correct and try again.

https://github.com/mit-plv/fiat-crypto/actions/runs/3470364042/jobs/5798475008

This is because I have the cat step set to run even if earlier steps failed. The actual issue here is that opam failed to install in an earlier step with

connect ETIMEDOUT 185.199.110.133:443
  Waiting 20 seconds before trying again
  connect ETIMEDOUT 185.199.110.133:443
  Waiting 14 seconds before trying again
  Error: connect ETIMEDOUT 185.199.110.133:443

validate times out in https://github.com/mit-plv/fiat-crypto/actions/runs/3470357503/jobs/5798461079 and https://github.com/mit-plv/fiat-crypto/actions/runs/3470364029/jobs/5798476434 because changed Coq version between steps results in project rebuild. We should really have a better way of dealing with Coq changing versions under us... (no progress on coq-community/docker-coq-action#80 yet...)

boringssl test failure at https://github.com/mit-plv/fiat-crypto/actions/runs/3524875493/jobs/5910863477#step:7:111 on a PR changing exact _ to refine _ in a Qed'd proof (aka a PR that cannot have changed anything impacting BoringSSL)

 FAILED: crypto_test_data.cc /home/runner/work/fiat-crypto/fiat-crypto/boringssl/build/crypto_test_data.cc 
  cd /home/runner/work/fiat-crypto/fiat-crypto/boringssl && /opt/hostedtoolcache/go/1.13.15/x64/bin/go run util/embed_test_data.go -file-list /home/runner/work/fiat-crypto/fiat-crypto/boringssl/build/embed_test_data_args.txt > /home/runner/work/fiat-crypto/fiat-crypto/boringssl/build/crypto_test_data.cc
  # command-line-arguments
  Error: util/embed_test_data.go:79:16: undefined: os.ReadFile
  Error: util/embed_test_data.go:129:16: undefined: os.ReadFile

Package g++-7 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'g++-7' has no installation candidate

https://github.com/mit-plv/fiat-crypto/actions/runs/3524875493/jobs/5936926857#step:3:67

https://github.com/mit-plv/bedrock2/actions/runs/3653687403/jobs/6176913824 timed out for no reason I can see

Windows does this sometimes, I think: it fails to quit terminal after all scripts complete

https://github.com/mit-plv/fiat-crypto/actions/runs/7700820211/job/20985462543?pr=1811

error: failed retrieving file 'ocaml-compiler-libs-5.1.0-1-x86_64.pkg.tar.zst.sig' from geo.mirror.pkgbuild.com : Operation too slow. Less than 1 bytes/sec transferred the last 10 seconds
 coq-8.18.0-2-x86_64 downloading...
error: failed retrieving file 'ghc-9.0.2-3-x86_64.pkg.tar.zst.sig' from geo.mirror.pkgbuild.com : Operation too slow. Less than 1 bytes/sec transferred the last 10 seconds
warning: failed to retrieve some files
error: failed to commit transaction (unexpected error)
Connecting to gitlab.alpinelinux.org (172.105.69.85:443)
wget: server returned error: HTTP/1.1 404 Not Found
ssl_client: write: Broken pipe
Error: Process completed with exit code 1.

https://github.com/mit-plv/rewriter/actions/runs/7705023867/job/20998315736?pr=148

error: failed retrieving file 'perl-5.38.2-1-x86_64.pkg.tar.zst.sig' from geo.mirror.pkgbuild.com : Operation too slow. Less than 1 bytes/sec transferred the last 10 seconds
https://github.com/mit-plv/fiat-crypto/actions/runs/8750960028/job/24015572653?pr=1885

https://github.com/mit-plv/fiat-crypto/actions/runs/9056914029/job/24884198262#step:7:29

Run jirutka/setup-alpine@v1
Run sudo -E ./setup-alpine.sh
Prepare rootfs directory
Download static apk-tools
  ▷ Downloading https://gitlab.alpinelinux.org/api/v4/projects/5/packages/generic/v2.14.0/x86_64/apk.static
  curl: (7) Failed to connect to gitlab.alpinelinux.org port 443 after 9945 ms: Connection timed out
  
  Error occurred at line 176:
    173 | APK="$RUNNER_TEMP/apk"
    174 | 
    175 | info "Downloading ${INPUT_APK_TOOLS_URL%\#*}"
  > 176 | download_file "$INPUT_APK_TOOLS_URL" "$APK"
    177 | chmod +x "$APK"
    178 | 
    179 | 
  Error: Error occurred at line 176: download_file "$INPUT_APK_TOOLS_URL" "$APK" (see the job log for more information)
  Error: Process completed with exit code 1.
[1 of 2] Compiling Main             ( src/ExtractionHaskell/bedrock2_fiat_crypto.hs, src/ExtractionHaskell/bedrock2_fiat_crypto.o )
panic! (the 'impossible' happened)
  GHC version 9.10.1:
	heap overflow

Please report this as a GHC bug:  https://www.haskell.org/ghc/reportabug

https://github.com/mit-plv/fiat-crypto/actions/runs/9138561025/job/25130899016?pr=1913#step:4:4949

Maybe we need to bump -M7G to -M8G? (And maybe we should upstream a bug report to not call heap overflows impossible?)