ethereum/evmlab

Failure to start/recover after full disk

holiman opened this issue · 3 comments

posting this here so I don't forget it

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 223, in _raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.6/dist-packages/requests/models.py", line 939, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localunixsocket/v1.35/containers/create?name=geth

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "utilities/fuzzerweb.py", line 59, in <module>
    main()
  File "utilities/fuzzerweb.py", line 50, in main
    fuzzer.startDaemons()
  File "/datadrive/evmlab/utilities/fuzzer.py", line 297, in startDaemons
    procinfo = startDaemon(client_name, cmd)
  File "/datadrive/evmlab/utilities/fuzzer.py", line 262, in startDaemon
    cfg.logfilesPath():{ 'bind':'/logs/', 'mode':"rw"},
  File "/usr/local/lib/python3.6/dist-packages/docker/models/containers.py", line 745, in run
    detach=detach, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/docker/models/containers.py", line 803, in create
    resp = self.client.api.create_container(**create_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/docker/api/container.py", line 403, in create_container
    return self.create_container_from_config(config, name)
  File "/usr/local/lib/python3.6/dist-packages/docker/api/container.py", line 414, in create_container_from_config
    return self._result(res, True)
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 229, in _result
    self._raise_for_status(response)
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 225, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/usr/local/lib/python3.6/dist-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 500 Server Error: Internal Server Error ("mkdir /var/lib/docker/aufs/mnt/e7166fe1252e1c448d1d2fd6cc0118ff893d543b6cde1af4bc135e8f8521c6c1-init: no space left on device")
root@fuzz02:/datadrive# df
Filesystem     1K-blocks    Used Available Use% Mounted on
udev            16457764       0  16457764   0% /dev
tmpfs            3293960  297268   2996692  10% /run
/dev/xvda1       8065444 8049060         0 100% /
tmpfs           16469784       0  16469784   0% /dev/shm
tmpfs               5120       0      5120   0% /run/lock
tmpfs           16469784       0  16469784   0% /sys/fs/cgroup
/dev/loop0         89088   89088         0 100% /snap/core/5145
/dev/loop1         12928   12928         0 100% /snap/amazon-ssm-agent/295
/dev/xvdh      309506048  505944 293255080   1% /datadrive
/dev/loop2         90112   90112         0 100% /snap/core/5328
/dev/loop3         13056   13056         0 100% /snap/amazon-ssm-agent/495
tmpfs            3293956       0   3293956   0% /run/user/1000

Btw. when we abort the script it might leave the docker container running (see 409 error in https://github.com/ethereum/evmlab/wiki/utils-fuzzer). I could add some code to make the script autorecover from this situation (stopping the running container) but the question is if it should try to do that by default or only if we provide a certain cmdline switch?

Yeah, it often leaves the docker container running. That doesn't seem to be a problem, however, because at next run it will restart it again, and it hasn't been any problems on prod

Fixed by cleaning up after each run