Fuzzer engine, spurious failures
holiman opened this issue · 6 comments
Sometimes, this happens:
2018-10-18 09:33:58,001 - INFO - Starting processes for ['geth', 'parity'] on test DEFAULT-Wed_09_22_38-11214-573185
2018-10-18 09:33:58,179 - INFO - All procs finished for test DEFAULT-Wed_09_22_38-11214-573183
2018-10-18 09:33:58,232 - INFO - Processed 3228 steps for geth on test DEFAULT-Wed_09_22_38-11214-573183, pTime:53.07 ms
Traceback (most recent call last):
File "utilities/fuzzerweb.py", line 59, in <module>
main()
File "utilities/fuzzerweb.py", line 55, in main
f.startFuzzing()
File "/datadrive/evmlab/utilities/fuzzer.py", line 593, in startFuzzing
self.postprocess_test(test)
File "/datadrive/evmlab/utilities/fuzzer.py", line 524, in postprocess_test
data = end_processes(test)
File "/datadrive/evmlab/utilities/fuzzer.py", line 424, in end_processes
with open(test.tempTraceLocation(client_name)) as output:
FileNotFoundError: [Errno 2] No such file or directory: '/datadrive/evmlabtemp/logs/DEFAULT-Wed_09_22_38-11214-573183-parity.trace.log'
I ran the testcase in question manually inside the parity docker image, and it worked fine. But for some reason, the file didn't exist. Disk was not full. It happened after 573182
testcases, so it doesn't happen often.
Is this the codeline that stores the trace? https://github.com/ethereum/evmlab/blame/master/utilities/fuzzer.py#L143
It reappeared again, but I have some more log output this time:
2018-11-10 23:54:22,508 - INFO - Starting processes for ['geth', 'parity'] on test DEFAULT-Sat_09_04_09-5401-363088
2018-11-10 23:54:22,519 - INFO - All procs finished for test DEFAULT-Sat_09_04_09-5401-363085
2018-11-10 23:54:22,520 - INFO - Processed 8 steps for geth on test DEFAULT-Sat_09_04_09-5401-363085, pTime:0.30 ms
2018-11-10 23:54:22,520 - WARNING - The file /datadrive/evmlabtemp/logs/DEFAULT-Sat_09_04_09-5401-363085-parity.trace.log could not be found!
2018-11-10 23:54:22,520 - WARNING - Socket event [1][17]
2018-11-10 23:54:22,520 - WARNING - Socket data b'\x01\x00\x00\x00\x00\x00\x01drpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:87: adding pid 10540 to cgroups caused \\"failed to write 10540 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/d5fede65f70d4f14ddddc0449c4a8d27ab4f1fa9ffb6352baef5d700c48f7a58/cgroup.procs: invalid argument\\""\n\r\n'b''
2018-11-10 23:54:22,520 - INFO - Processed 8 steps for parity on test DEFAULT-Sat_09_04_09-5401-363085, pTime:0.22 ms
2018-11-10 23:54:22,520 - INFO - Removing test artefacts ['/datadrive/evmlabtemp/testfiles/DEFAULT-Sat_09_04_09-5401-363085-test.json', '/datadrive/evmlabtemp/logs/DEFAULT-Sat_09_04_09-5401-363085-geth.trace.log', '/datadrive/evmlabtemp/logs/DEFAULT-Sat_09_04_09-5401-363085-parity.trace.log']
Traceback (most recent call last):
File "utilities/fuzzerweb.py", line 61, in <module>
main()
File "utilities/fuzzerweb.py", line 56, in main
executor.startFuzzing()
File "/datadrive/evmlab/utilities/fuzzer.py", line 353, in startFuzzing
self.postprocess_test(test, reporting=self._fuzzer._config.enable_reporting)
File "/datadrive/evmlab/utilities/fuzzer.py", line 278, in postprocess_test
failingTestcase = self._fuzzer.processTraces(test, forceSave=self._fuzzer._config.force_save)
File "/datadrive/evmlab/utilities/fuzzer.py", line 487, in processTraces
test.removeFiles()
File "/datadrive/evmlab/utilities/fuzzer.py", line 147, in removeFiles
os.remove(f)
FileNotFoundError: [Errno 2] No such file or directory: '/datadrive/evmlabtemp/logs/DEFAULT-Sat_09_04_09-5401-363085-parity.trace.log'
2018-11-11 01:02:14,011 - INFO - 127.0.0.1 - - [11/Nov/2018 01:02:14] "GET / HTTP/1.0" 200 -
So, apparently, this is what happens when we try to do the docker_exec
\x01\x00\x00\x00\x00\x00\x01drpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:87: adding pid 10540 to cgroups caused \\"failed to write 10540 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/d5fede65f70d4f14ddddc0449c4a8d27ab4f1fa9ffb6352baef5d700c48f7a58/cgroup.procs: invalid argument\\""\n\r\n
Related: moby/moby#31230
possibly loosely related: opencontainers/runc#1884
Actually, now that we know what it is. we don't really have to fix it. If there is any output from the process, we know the instantiation failed (otherwise the output would have wound up into the file, since we pipe everything to file). And then we can just rerun the testcase, and move on.
Same, on the second machine with more up-to-date code:
2018-11-13 01:28:00,790 - INFO - All procs finished for test DEFAULT-Mon_14_35_06-13647-751926
2018-11-13 01:28:00,791 - WARNING - The file /datadrive/evmlabtemp/logs/DEFAULT-Mon_14_35_06-13647-751926-geth.trace.log could not be found!
2018-11-13 01:28:00,791 - WARNING - Socket event [17][17]
2018-11-13 01:28:00,792 - WARNING - Socket data b'\x01\x00\x00\x00\x00\x00\x01drpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:87: adding pid 11733 to cgroups caused \\"failed to write 11733 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/2981fdfce1455048285fa0f10812256b87a959e572e49633eea95d0041ae936d/cgroup.procs: invalid argument\\""\n\r\n'b''
2018-11-13 01:28:00,792 - INFO - Processed 0 steps for geth on test DEFAULT-Mon_14_35_06-13647-751926, pTime:0.77 ms
2018-11-13 01:28:00,792 - INFO - Processed 4 steps for parity on test DEFAULT-Mon_14_35_06-13647-751926, pTime:0.35 ms
2018-11-13 01:28:00,793 - WARNING - CONSENSUS BUG!!!
2018-11-13 01:28:00,793 - INFO - Saving testcase as /root/tmp/evmlab/artefacts/DEFAULT-Mon_14_35_06-13647-751926-test.json
2018-11-13 01:28:00,794 - INFO - Saving trace as /root/tmp/evmlab/artefacts/DEFAULT-Mon_14_35_06-13647-751926-geth.trace.log
Traceback (most recent call last):
File "/usr/lib/python3.6/shutil.py", line 544, in move
os.rename(src, real_dst)
OSError: [Errno 18] Invalid cross-device link: '/datadrive/evmlabtemp/logs/DEFAULT-Mon_14_35_06-13647-751926-geth.trace.log' -> '/root/tmp/evmlab/artefacts/DEFAULT-Mon_14_35_06-13647-751926-geth.trace.log'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "utilities/fuzzerweb.py", line 61, in <module>
main()
File "utilities/fuzzerweb.py", line 56, in main
executor.startFuzzing()
File "/datadrive/evmlab/utilities/fuzzer.py", line 357, in startFuzzing
self.postprocess_test(test, reporting=self._fuzzer._config.enable_reporting)
File "/datadrive/evmlab/utilities/fuzzer.py", line 281, in postprocess_test
failingTestcase = self._fuzzer.processTraces(test, forceSave=self._fuzzer._config.force_save)
File "/datadrive/evmlab/utilities/fuzzer.py", line 526, in processTraces
test.saveArtefacts()
File "/datadrive/evmlab/utilities/fuzzer.py", line 181, in saveArtefacts
shutil.move(f, newloc)
File "/usr/lib/python3.6/shutil.py", line 558, in move
copy_function(src, real_dst)
File "/usr/lib/python3.6/shutil.py", line 257, in copy2
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/usr/lib/python3.6/shutil.py", line 120, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: '/datadrive/evmlabtemp/logs/DEFAULT-Mon_14_35_06-13647-751926-geth.trace.log'