trackmania-rl/tmrl

wandb connection error when starting the trainer

Closed this issue · 1 comments

Hello. I am running the example code. I did not change the config, so the wandb info is still the same but when I run the trainer I get the following traceback.

wandb: View run jolly-violet-855 at: https://wandb.ai/tmrl/tmrl/runs/test_123
wandb: View project at: https://wandb.ai/tmrl/tmrl
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: C:\Users\Rajiv\AppData\Local\Temp\tmp12lymgnh\wandb\run-20240508_163952-test_123\logs
Exception in thread NetStatThr:
Traceback (most recent call last):
File "E:\Python\Python311\Lib\threading.py", line 1038, in _bootstrap_inner
self.run()
File "E:\Python\Python311\Lib\threading.py", line 975, in run
self._target(*self._args, **self._kwargs)
File "D:\Project\MGAIA\MGAIA_A3\venv\Lib\site-packages\wandb\sdk\wandb_run.py", line 278, in check_network_status
self._loop_check_status(
File "D:\Project\MGAIA\MGAIA_A3\venv\Lib\site-packages\wandb\sdk\wandb_run.py", line 233, in _loop_check_status
local_handle = request()
^^^^^^^^^
File "D:\Project\MGAIA\MGAIA_A3\venv\Lib\site-packages\wandb\sdk\interface\interface.py", line 884, in deliver_network_status
return self._deliver_network_status(status)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Project\MGAIA\MGAIA_A3\venv\Lib\site-packages\wandb\sdk\interface\interface_shared.py", line 504, in _deliver_network_status
return self._deliver_record(record)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Project\MGAIA\MGAIA_A3\venv\Lib\site-packages\wandb\sdk\interface\interface_shared.py", line 453, in _deliver_record
handle = mailbox._deliver_record(record, interface=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Project\MGAIA\MGAIA_A3\venv\Lib\site-packages\wandb\sdk\lib\mailbox.py", line 455, in _deliver_record
interface._publish(record)
File "D:\Project\MGAIA\MGAIA_A3\venv\Lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
self._sock_client.send_record_publish(record)
File "D:\Project\MGAIA\MGAIA_A3\venv\Lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
self.send_server_request(server_req)
File "D:\Project\MGAIA\MGAIA_A3\venv\Lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "D:\Project\MGAIA\MGAIA_A3\venv\Lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "D:\Project\MGAIA\MGAIA_A3\venv\Lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
^^^^^^^^^^^^^^^^^^^^^
ConnectionAbortedError: [WinError 10053] An established connection was aborted by the software in your host machine

This traceback does not happen when I edit out the wandb key from the config and select "don't visualize...".
But I'd like to have the visuals. I don't know if this is a TMRL issue or something on my end. But I hope someone can shed some light on this.

So I fixed this by re-installing torch using this command

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --no-cache-dir

Now I don't get this huge stack trace.