ethereum/evmlab

Fuzzer engine, spurious failures

holiman opened this issue · 6 comments

Sometimes, this happens:

2018-10-18 09:33:58,001 - INFO - Starting processes for ['geth', 'parity'] on test DEFAULT-Wed_09_22_38-11214-573185
2018-10-18 09:33:58,179 - INFO - All procs finished for test DEFAULT-Wed_09_22_38-11214-573183
2018-10-18 09:33:58,232 - INFO - Processed 3228 steps for geth on test DEFAULT-Wed_09_22_38-11214-573183, pTime:53.07 ms
Traceback (most recent call last):
  File "utilities/fuzzerweb.py", line 59, in <module>
    main()
  File "utilities/fuzzerweb.py", line 55, in main
    f.startFuzzing()
  File "/datadrive/evmlab/utilities/fuzzer.py", line 593, in startFuzzing
    self.postprocess_test(test)
  File "/datadrive/evmlab/utilities/fuzzer.py", line 524, in postprocess_test
    data = end_processes(test)
  File "/datadrive/evmlab/utilities/fuzzer.py", line 424, in end_processes
    with open(test.tempTraceLocation(client_name)) as output:
FileNotFoundError: [Errno 2] No such file or directory: '/datadrive/evmlabtemp/logs/DEFAULT-Wed_09_22_38-11214-573183-parity.trace.log'

I ran the testcase in question manually inside the parity docker image, and it worked fine. But for some reason, the file didn't exist. Disk was not full. It happened after 573182 testcases, so it doesn't happen often.

It reappeared again, but I have some more log output this time:

2018-11-10 23:54:22,508 - INFO - Starting processes for ['geth', 'parity'] on test DEFAULT-Sat_09_04_09-5401-363088
2018-11-10 23:54:22,519 - INFO - All procs finished for test DEFAULT-Sat_09_04_09-5401-363085
2018-11-10 23:54:22,520 - INFO - Processed 8 steps for geth on test DEFAULT-Sat_09_04_09-5401-363085, pTime:0.30 ms
2018-11-10 23:54:22,520 - WARNING - The file /datadrive/evmlabtemp/logs/DEFAULT-Sat_09_04_09-5401-363085-parity.trace.log could not be found!
2018-11-10 23:54:22,520 - WARNING - Socket event [1][17]
2018-11-10 23:54:22,520 - WARNING - Socket data b'\x01\x00\x00\x00\x00\x00\x01drpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:87: adding pid 10540 to cgroups caused \\"failed to write 10540 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/d5fede65f70d4f14ddddc0449c4a8d27ab4f1fa9ffb6352baef5d700c48f7a58/cgroup.procs: invalid argument\\""\n\r\n'b''
2018-11-10 23:54:22,520 - INFO - Processed 8 steps for parity on test DEFAULT-Sat_09_04_09-5401-363085, pTime:0.22 ms
2018-11-10 23:54:22,520 - INFO - Removing test artefacts ['/datadrive/evmlabtemp/testfiles/DEFAULT-Sat_09_04_09-5401-363085-test.json', '/datadrive/evmlabtemp/logs/DEFAULT-Sat_09_04_09-5401-363085-geth.trace.log', '/datadrive/evmlabtemp/logs/DEFAULT-Sat_09_04_09-5401-363085-parity.trace.log']
Traceback (most recent call last):
  File "utilities/fuzzerweb.py", line 61, in <module>
    main()
  File "utilities/fuzzerweb.py", line 56, in main
    executor.startFuzzing()
  File "/datadrive/evmlab/utilities/fuzzer.py", line 353, in startFuzzing
    self.postprocess_test(test, reporting=self._fuzzer._config.enable_reporting)
  File "/datadrive/evmlab/utilities/fuzzer.py", line 278, in postprocess_test
    failingTestcase = self._fuzzer.processTraces(test, forceSave=self._fuzzer._config.force_save)
  File "/datadrive/evmlab/utilities/fuzzer.py", line 487, in processTraces
    test.removeFiles()
  File "/datadrive/evmlab/utilities/fuzzer.py", line 147, in removeFiles
    os.remove(f)
FileNotFoundError: [Errno 2] No such file or directory: '/datadrive/evmlabtemp/logs/DEFAULT-Sat_09_04_09-5401-363085-parity.trace.log'
2018-11-11 01:02:14,011 - INFO - 127.0.0.1 - - [11/Nov/2018 01:02:14] "GET / HTTP/1.0" 200 -

So, apparently, this is what happens when we try to do the docker_exec

\x01\x00\x00\x00\x00\x00\x01drpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:87: adding pid 10540 to cgroups caused \\"failed to write 10540 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/d5fede65f70d4f14ddddc0449c4a8d27ab4f1fa9ffb6352baef5d700c48f7a58/cgroup.procs: invalid argument\\""\n\r\n

Related: moby/moby#31230
possibly loosely related: opencontainers/runc#1884

Actually, now that we know what it is. we don't really have to fix it. If there is any output from the process, we know the instantiation failed (otherwise the output would have wound up into the file, since we pipe everything to file). And then we can just rerun the testcase, and move on.

Same, on the second machine with more up-to-date code:

2018-11-13 01:28:00,790 - INFO - All procs finished for test DEFAULT-Mon_14_35_06-13647-751926
2018-11-13 01:28:00,791 - WARNING - The file /datadrive/evmlabtemp/logs/DEFAULT-Mon_14_35_06-13647-751926-geth.trace.log could not be found!
2018-11-13 01:28:00,791 - WARNING - Socket event [17][17]
2018-11-13 01:28:00,792 - WARNING - Socket data b'\x01\x00\x00\x00\x00\x00\x01drpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:87: adding pid 11733 to cgroups caused \\"failed to write 11733 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/2981fdfce1455048285fa0f10812256b87a959e572e49633eea95d0041ae936d/cgroup.procs: invalid argument\\""\n\r\n'b''
2018-11-13 01:28:00,792 - INFO - Processed 0 steps for geth on test DEFAULT-Mon_14_35_06-13647-751926, pTime:0.77 ms 
2018-11-13 01:28:00,792 - INFO - Processed 4 steps for parity on test DEFAULT-Mon_14_35_06-13647-751926, pTime:0.35 ms 
2018-11-13 01:28:00,793 - WARNING - CONSENSUS BUG!!!
2018-11-13 01:28:00,793 - INFO - Saving testcase as /root/tmp/evmlab/artefacts/DEFAULT-Mon_14_35_06-13647-751926-test.json
2018-11-13 01:28:00,794 - INFO - Saving trace as /root/tmp/evmlab/artefacts/DEFAULT-Mon_14_35_06-13647-751926-geth.trace.log
Traceback (most recent call last):
  File "/usr/lib/python3.6/shutil.py", line 544, in move
    os.rename(src, real_dst)
OSError: [Errno 18] Invalid cross-device link: '/datadrive/evmlabtemp/logs/DEFAULT-Mon_14_35_06-13647-751926-geth.trace.log' -> '/root/tmp/evmlab/artefacts/DEFAULT-Mon_14_35_06-13647-751926-geth.trace.log'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "utilities/fuzzerweb.py", line 61, in <module>
    main()
  File "utilities/fuzzerweb.py", line 56, in main
    executor.startFuzzing()
  File "/datadrive/evmlab/utilities/fuzzer.py", line 357, in startFuzzing
    self.postprocess_test(test, reporting=self._fuzzer._config.enable_reporting)
  File "/datadrive/evmlab/utilities/fuzzer.py", line 281, in postprocess_test
    failingTestcase = self._fuzzer.processTraces(test, forceSave=self._fuzzer._config.force_save)
  File "/datadrive/evmlab/utilities/fuzzer.py", line 526, in processTraces
    test.saveArtefacts()
  File "/datadrive/evmlab/utilities/fuzzer.py", line 181, in saveArtefacts
    shutil.move(f, newloc)
  File "/usr/lib/python3.6/shutil.py", line 558, in move
    copy_function(src, real_dst)
  File "/usr/lib/python3.6/shutil.py", line 257, in copy2
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.6/shutil.py", line 120, in copyfile
    with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/datadrive/evmlabtemp/logs/DEFAULT-Mon_14_35_06-13647-751926-geth.trace.log'