OutOfMemoryError: Task was killed due to the node running low on memory.
zcswdt opened this issue · 0 comments
Traceback (most recent call last):
File "run_sim.py", line 152, in
remaining_observations=remaining_observations)
File "/home/zcs/work/train-my-fling/flingbot/utils.py", line 416, in step_env
for obs, env_id in ray.get(step_retval):
File "/home/zcs/miniconda3/envs/flingbot/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
return func(*args, **kwargs)
File "/home/zcs/miniconda3/envs/flingbot/lib/python3.6/site-packages/ray/_private/worker.py", line 2523, in get
raise value
ray.exceptions.OutOfMemoryError: Task was killed due to the node running low on memory.
Memory on the node (IP: 192.168.0.107, ID: fc2befb2867ce88e73a8a45572c43a640751ae1f2b5e15bd8315f293) where the task (actor ID: f9cc340f5aef7b479d86345001000000, name=SimEnv.init, pid=4331, memory used=2.22GB) was running was 59.49GB / 62.58GB (0.950744), which exceeds the memory usage threshold of 0.95. Ray killed this worker (ID: d98ac96cdd66ea8c0a2604609381c3256c8285b87822896c767f7714) because it was the most recently scheduled task; to see more information about memory usage on this node, use ray logs raylet.out -ip 192.168.0.107
. To see the logs of the worker, use ray logs worker-d98ac96cdd66ea8c0a2604609381c3256c8285b87822896c767f7714*out -ip 192.168.0.107. Top 10 memory users: PID MEM(GB) COMMAND 7904 2.92 /home/zcs/work/software/pycharm-2023.2.5/jbr/bin/java -classpath /home/zcs/work/software/pycharm-202... 4312 2.22 ray::SimEnv 4331 2.22 ray::SimEnv 4253 2.17 ray::SimEnv 4288 2.15 ray::SimEnv 4252 2.15 ray::SimEnv 4268 2.14 ray::SimEnv.step 4302 2.13 ray::SimEnv.step 4279 2.13 ray::SimEnv.step 4296 2.12 ray::SimEnv Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. Set max_restarts and max_task_retries to enable retry when the task crashes due to OOM. To adjust the kill threshold, set the environment variable
RAY_memory_usage_thresholdwhen starting Ray. To disable worker killing, set the environment variable
RAY_memory_monitor_refresh_ms` to zero