version_diff keeps increasing

Thanks for building this awesome library. I believe I am having some trouble getting any example to work and it would be great if you had any suggestions to what I could try.

Using the example
python -m sample_factory.algorithms.appo.train_appo --env=doom_basic --algo=APPO --train_for_env_steps=3000000 --num_workers=20 --num_envs_per_worker=20 --experiment=doom_basic

The policy lag seems to keep linearly increasing which I assume is not expected? It's like the model version isn't being updated.

[2022-03-02 23:05:46,460][18482] Fps is (10 sec: 20404.3, 60 sec: 20404.3, 300 sec: 20404.3). Total num frames: 241664. Throughput: 0: 3098.6. Samples: 54300. Policy #0 lag: (min: 52.0, avg: 52.0, max: 52.0)
[2022-03-02 23:05:46,460][18482] Avg episode reward: [(0, '-1.416')]
[2022-03-02 23:05:51,461][18482] Fps is (10 sec: 19999.7, 60 sec: 19883.6, 300 sec: 19883.6). Total num frames: 335872. Throughput: 0: 4499.1. Samples: 83850. Policy #0 lag: (min: 77.0, avg: 77.0, max: 77.0)
[2022-03-02 23:05:51,461][18482] Avg episode reward: [(0, '-1.509')]
[2022-03-02 23:05:56,480][18482] Fps is (10 sec: 19622.1, 60 sec: 20013.5, 300 sec: 20013.5). Total num frames: 438272. Throughput: 0: 4965.4. Samples: 113450. Policy #0 lag: (min: 104.0, avg: 104.0, max: 104.0)
[2022-03-02 23:05:56,480][18482] Avg episode reward: [(0, '-1.825')]
[2022-03-02 23:06:01,488][18482] Fps is (10 sec: 19606.6, 60 sec: 19772.8, 300 sec: 19772.8). Total num frames: 532480. Throughput: 0: 4454.0. Samples: 128060. Policy #0 lag: (min: 104.0, avg: 104.0, max: 104.0)
[2022-03-02 23:06:01,489][18482] Avg episode reward: [(0, '-1.599')]
[2022-03-02 23:06:06,514][18482] Fps is (10 sec: 19593.5, 60 sec: 19873.5, 300 sec: 19873.5). Total num frames: 634880. Throughput: 0: 5167.0. Samples: 157920. Policy #0 lag: (min: 131.0, avg: 131.0, max: 131.0)
[2022-03-02 23:06:06,514][18482] Avg episode reward: [(0, '-1.209')]
[2022-03-02 23:06:11,515][18482] Fps is (10 sec: 19609.2, 60 sec: 19726.1, 300 sec: 19726.1). Total num frames: 729088. Throughput: 0: 5162.6. Samples: 187380. Policy #0 lag: (min: 157.0, avg: 157.0, max: 157.0)

Environment:
Running Ubuntu 20:04 in WSL 2 (maybe that's the problem).
sample-factory==1.120.0
torch==1.7.1+cu110 (I have tried 1.10 as well)

Hi! Yes, this is very weird, I've never seen this before! It could somehow be related to WSL because policy weight updates rely on shared GPU-side tensors that were known not to work properly in Windows versions of Pytorch (although I assumed WSL would fix that).

This exact effect can happen if somehow the policy_id is not updating on the inference workers, which will probably result in policy weights not updating too.

I am not near my workstation right now to take a look, but I can hopefully investigate in the next couple of days.

If you could add some logging here to see what policy_worker thinks learner policy version is, it'd help!

sample-factory/sample_factory/algorithms/appo/policy_worker.py

Line 159 in 4c6adb9

    
           learner_policy_version = self.shared_buffers.policy_versions[self.policy_id].item()

And maybe also in learner.py to see if it's actually being updated.

Sorry you encountered this! Hope we will be able to figure this out.

Thanks for your help! I tried adding a log message and the version is always -1 in the policy_worker.

So I started looked at the learner and I noticed an error which I guess I missed before:

THCudaCheck FAIL file=/pytorch/torch/csrc/generic/StorageSharing.cpp line=247 error=801 : operation not supported
[2022-03-03 23:00:20,340][12222] Learner 0 initialized
Traceback (most recent call last):
  File "/home/ngoodger/anaconda3/envs/sample_factory/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
    obj = _ForkingPickler.dumps(obj)
[2022-03-03 23:00:20,341][12181] Initializing policy workers...
  File "/home/ngoodger/anaconda3/envs/sample_factory/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "/home/ngoodger/anaconda3/envs/sample_factory/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 240, in reduce_tensor
    event_sync_required) = storage._share_cuda_()
RuntimeError: cuda runtime error (801) : operation not supported at /pytorch/torch/csrc/generic/StorageSharing.cpp:247

Seems like this is just not supported on Windows and I guess that applies to WSL as well. There might be some workarounds, otherwise I probably just need to natively install Ubuntu.

Thanks a lot for looking into this. Having a Linux installation was always my recommendation when it comes to reinforcement learning - your access to tools and libraries increases exponentially. It'd be nice to have SF working on Windows, but there are some major difficulties.

Maybe you can make it work with --device='cpu' - this will allow you to execute/debug some code, but not suitable for any sort of large scale training. macOS is known to work too, some of my colleagues use it for research and development, and then run experiments on clusters. On Mac you obviously don't have GPU/CUDA support at all.

Makes sense, I really hope eventually they add support for the required features on Windows. WSL has worked really well for me until now. After reinstalling Ubuntu on the same machine I'm happy to report that everything works great.

Thanks for building this amazing framework. It's really fantastic we are able to run these kind of experiments without breaking the bank or dealing with a distributed system.

I'm glad you found it useful!
Currently the next version (SF 2.0) is in development, hopefully it will be a little more flexible and friendly!

Does that mean Windows-friendly?
Is there any chance to add MuZero?

Looking forward to SF2!