[HANDS-ON BUG]
Opened this issue · 7 comments
Describe the bug
Problem with Unit 3: Deep Q-Learning with Atari Games 👾 using RL Baselines3 Zoo
Hello, I have an issue with pushing the model to hub.
I execute the line:
!python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --repo-name dqn-SpaceInvadersNoFrameskip-v4 -orga execbat -f logs/
Getting this:
Loading latest experiment, id=1 Loading logs/dqn/SpaceInvadersNoFrameskip-v4_1/SpaceInvadersNoFrameskip-v4.zip A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7) [Powered by Stella] Stacking 4 frames Wrapping the env in a VecTransposeImage. Uploading to execbat/dqn-SpaceInvadersNoFrameskip-v4, make sure to have the rights ℹ This function will save, evaluate, generate a video of your agent, create a model card and push everything to the hub. It might take up to some minutes if video generation is activated. This is a work in progress: if you encounter a bug, please open an issue. /home/evgenii/anaconda3/envs/huggingface/lib/python3.9/site-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'Repository' (from 'huggingface_hub.repository') is deprecated and will be removed from version '1.0'. Please prefer the http-based alternatives instead. Given its large adoption in legacy code, the complete removal is only planned on next major release. For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http. warnings.warn(warning_message, FutureWarning) Cloning https://huggingface.co/execbat/dqn-SpaceInvadersNoFrameskip-v4 into local empty directory. WARNING:huggingface_hub.repository:Cloning https://huggingface.co/execbat/dqn-SpaceInvadersNoFrameskip-v4 into local empty directory. Saving model to: hub/dqn-SpaceInvadersNoFrameskip-v4/dqn-SpaceInvadersNoFrameskip-v4 Traceback (most recent call last): File "/home/evgenii/anaconda3/envs/huggingface/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/evgenii/anaconda3/envs/huggingface/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/evgenii/anaconda3/envs/huggingface/lib/python3.9/site-packages/rl_zoo3/push_to_hub.py", line 398, in <module> package_to_hub( File "/home/evgenii/anaconda3/envs/huggingface/lib/python3.9/site-packages/rl_zoo3/push_to_hub.py", line 247, in package_to_hub _generate_replay(model, eval_env, video_length, is_deterministic, repo_local_path) File "/home/evgenii/anaconda3/envs/huggingface/lib/python3.9/site-packages/huggingface_sb3/push_to_hub.py", line 133, in _generate_replay env = VecVideoRecorder( File "/home/evgenii/anaconda3/envs/huggingface/lib/python3.9/site-packages/stable_baselines3/common/vec_env/vec_video_recorder.py", line 52, in __init__ assert self.env.render_mode == "rgb_array", f"The render_mode must be 'rgb_array', not {self.env.render_mode}" AssertionError: The render_mode must be 'rgb_array', not human Exception ignored in: <function VecVideoRecorder.__del__ at 0x73db523815e0> Traceback (most recent call last): File "/home/evgenii/anaconda3/envs/huggingface/lib/python3.9/site-packages/stable_baselines3/common/vec_env/vec_video_recorder.py", line 113, in __del__ self.close_video_recorder() File "/home/evgenii/anaconda3/envs/huggingface/lib/python3.9/site-packages/stable_baselines3/common/vec_env/vec_video_recorder.py", line 103, in close_video_recorder if self.recording: File "/home/evgenii/anaconda3/envs/huggingface/lib/python3.9/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 420, in __getattr__ return self.getattr_recursive(name) File "/home/evgenii/anaconda3/envs/huggingface/lib/python3.9/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 443, in getattr_recursive attr = self.venv.getattr_recursive(name) File "/home/evgenii/anaconda3/envs/huggingface/lib/python3.9/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 445, in getattr_recursive attr = getattr(self.venv, name) AttributeError: 'DummyVecEnv' object has no attribute 'recording'
So I don;t know how to get the replay file and push altogether to my huggingface repository.
Please help.
unit3.zip
Material
I made it in conda env on my laptop. env built on python 3.9.18
-OS: Ubuntu 22.04.3 LTS
I'm having the very same problem, but i'm running the notebook on Google Colab, with the suggested GPU.
Also having the same problem. Running the notebook on Google Colab using the t4. Training works fine but when pushing to the hub my model card is also blank.
input
!python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --repo-name dqn-SpaceInvadersNoFrameskip-v4-2 -orga cgwell -f logs/
output
2024-06-30` 01:40:27.852357: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-30 01:40:27.852414: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-30 01:40:27.948812: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-06-30 01:40:27.956169: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-06-30 01:40:29.011445: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Loading latest experiment, id=4
Loading logs/dqn/SpaceInvadersNoFrameskip-v4_4/SpaceInvadersNoFrameskip-v4.zip
A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
[Powered by Stella]
Stacking 4 frames
Wrapping the env in a VecTransposeImage.
Uploading to cgwell/dqn-SpaceInvadersNoFrameskip-v4-2, make sure to have the rights
ℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to some
minutes if video generation is activated. This is a work in progress: if you
encounter a bug, please open an issue.
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'Repository' (from 'huggingface_hub.repository') is deprecated and will be removed from version '1.0'. Please prefer the http-based alternatives instead. Given its large adoption in legacy code, the complete removal is only planned on next major release.
For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.
warnings.warn(warning_message, FutureWarning)
Cloning https://huggingface.co/cgwell/dqn-SpaceInvadersNoFrameskip-v4-2 into local empty directory.
WARNING:huggingface_hub.repository:Cloning https://huggingface.co/cgwell/dqn-SpaceInvadersNoFrameskip-v4-2 into local empty directory.
Saving model to: hub/dqn-SpaceInvadersNoFrameskip-v4-2/dqn-SpaceInvadersNoFrameskip-v4
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/rl_zoo3/push_to_hub.py", line 398, in
package_to_hub(
File "/usr/local/lib/python3.10/dist-packages/rl_zoo3/push_to_hub.py", line 247, in package_to_hub
_generate_replay(model, eval_env, video_length, is_deterministic, repo_local_path)
File "/usr/local/lib/python3.10/dist-packages/huggingface_sb3/push_to_hub.py", line 133, in _generate_replay
env = VecVideoRecorder(
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/vec_env/vec_video_recorder.py", line 52, in init
assert self.env.render_mode == "rgb_array", f"The render_mode must be 'rgb_array', not {self.env.render_mode}"
AssertionError: The render_mode must be 'rgb_array', not human
Exception ignored in: <function VecVideoRecorder.del at 0x7c8d835f37f0>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/vec_env/vec_video_recorder.py", line 113, in del
self.close_video_recorder()
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/vec_env/vec_video_recorder.py", line 103, in close_video_recorder
if self.recording:
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 420, in getattr
return self.getattr_recursive(name)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 443, in getattr_recursive
attr = self.venv.getattr_recursive(name)
File "/usr/local/lib/python3.10/dist-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 445, in getattr_recursive
attr = getattr(self.venv, name)
AttributeError: 'DummyVecEnv' object has no attribute 'recording'
Solved, in case it might be useful.
Because of error AssertionError: The render_mode must be 'rgb_array', not human
. So pass 'rgb_array' render mode as --env-kwargs
when calling rl_zoo3.push_to_hub.
Something like:
!python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --verbose 1 --repo-name dqn-SpaceInvadersNoFrameskip-v4 -orga {your_huggingface_nick} -f logs/ --env-kwargs 'render_mode:"rgb_array"'
Hope it will help
This script creates a "hub" folder, which can be saved to a drive. However, using push_to_hub currently results in an infinite loop and fails to upload all the files to Hugging Face, resulting in an empty model card.
To work around this issue, manually upload the files from the "hub" folder to Hugging Face. Here is an example output when running the command:
!python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 --verbose 1 --repo-name dqn-SpaceInvadersNoFrameskip-v4 -orga {your_huggingface_nick} -f logs/ --env-kwargs 'render_mode:"rgb_array"'
2024-07-01 19:40:29.029621: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-01 19:40:29.029671: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-01 19:40:29.031362: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-07-01 19:40:29.038380: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-07-01 19:40:30.133283: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Loading latest experiment, id=1
Loading logs/dqn/SpaceInvadersNoFrameskip-v4_1/SpaceInvadersNoFrameskip-v4.zip
A.L.E: Arcade Learning Environment (version 0.8.1+53f58b7)
[Powered by Stella]
Stacking 4 frames
Wrapping the env in a VecTransposeImage.
Uploading to cgwell/dqn-SpaceInvadersNoFrameskip-v4, make sure to have the rights
ℹ This function will save, evaluate, generate a video of your agent,
create a model card and push everything to the hub. It might take up to some
minutes if video generation is activated. This is a work in progress: if you
encounter a bug, please open an issue.
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_deprecation.py:131: FutureWarning: 'Repository' (from 'huggingface_hub.repository') is deprecated and will be removed from version '1.0'. Please prefer the http-based alternatives instead. Given its large adoption in legacy code, the complete removal is only planned on next major release.
For more details, please read https://huggingface.co/docs/huggingface_hub/concepts/git_vs_http.
warnings.warn(warning_message, FutureWarning)
Cloning https://huggingface.co/cgwell/dqn-SpaceInvadersNoFrameskip-v4 into local empty directory.
WARNING:huggingface_hub.repository:Cloning https://huggingface.co/cgwell/dqn-SpaceInvadersNoFrameskip-v4 into local empty directory.
Saving model to: hub/dqn-SpaceInvadersNoFrameskip-v4/dqn-SpaceInvadersNoFrameskip-v4
/usr/local/lib/python3.10/dist-packages/gymnasium/utils/passive_env_checker.py:335: UserWarning: WARN: No render fps was declared in the environment (env.metadata['render_fps'] is None or not defined), rendering may occur at inconsistent fps.
logger.warn(
Saving video to /tmp/tmp89m9ips1/-step-0-to-step-1000.mp4
Moviepy - Building video /tmp/tmp89m9ips1/-step-0-to-step-1000.mp4.
Moviepy - Writing video /tmp/tmp89m9ips1/-step-0-to-step-1000.mp4
Moviepy - Done !
Moviepy - video ready /tmp/tmp89m9ips1/-step-0-to-step-1000.mp4
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/tmp89m9ips1/-step-0-to-step-1000.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.29.100
Duration: 00:00:33.40, start: 0.000000, bitrate: 49 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 160x210, 46 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 @ 0x5a4a004fa640] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x5a4a004fa640] profile High, level 1.2, 4:2:0, 8-bit
[libx264 @ 0x5a4a004fa640] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'hub/dqn-SpaceInvadersNoFrameskip-v4/replay.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.76.100
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 160x210, q=2-31, 30 fps, 15360 tbn (default)
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
encoder : Lavc58.134.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame= 1002 fps=0.0 q=-1.0 Lsize= 193kB time=00:00:33.30 bitrate= 47.4kbits/s speed=42.7x
video:182kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 5.924415%
[libx264 @ 0x5a4a004fa640] frame I:5 Avg QP:16.02 size: 2571
[libx264 @ 0x5a4a004fa640] frame P:521 Avg QP:23.15 size: 283
[libx264 @ 0x5a4a004fa640] frame B:476 Avg QP:29.96 size: 53
[libx264 @ 0x5a4a004fa640] consecutive B-frames: 30.2% 16.2% 9.3% 44.3%
[libx264 @ 0x5a4a004fa640] mb I I16..4: 23.6% 40.3% 36.1%
[libx264 @ 0x5a4a004fa640] mb P I16..4: 0.5% 1.0% 0.7% P16..4: 6.0% 2.5% 1.4% 0.0% 0.0% skip:88.0%
[libx264 @ 0x5a4a004fa640] mb B I16..4: 0.2% 0.1% 0.1% B16..8: 8.9% 1.0% 0.1% direct: 0.1% skip:89.6% L0:51.3% L1:48.2% BI: 0.5%
[libx264 @ 0x5a4a004fa640] 8x8 transform intra:42.9% inter:6.2%
[libx264 @ 0x5a4a004fa640] coded y,uvDC,uvAC intra: 20.1% 35.7% 32.0% inter: 1.3% 1.6% 1.3%
[libx264 @ 0x5a4a004fa640] i16 v,h,dc,p: 43% 52% 5% 0%
[libx264 @ 0x5a4a004fa640] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 24% 8% 67% 1% 0% 0% 0% 0% 0%
[libx264 @ 0x5a4a004fa640] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 33% 13% 43% 2% 1% 3% 2% 2% 1%
[libx264 @ 0x5a4a004fa640] i8c dc,h,v,p: 57% 29% 13% 1%
[libx264 @ 0x5a4a004fa640] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x5a4a004fa640] ref P L0: 75.9% 5.2% 10.8% 8.1%
[libx264 @ 0x5a4a004fa640] ref B L0: 82.1% 15.7% 2.2%
[libx264 @ 0x5a4a004fa640] ref B L1: 96.3% 3.7%
[libx264 @ 0x5a4a004fa640] kb/s:44.45
ℹ Pushing repo dqn-SpaceInvadersNoFrameskip-v4 to the Hugging Face
Hub
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/repository.py", line 418, in _lfs_log_progress
yield
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/repository.py", line 1109, in git_push
stdout, stderr = process.communicate()
File "/usr/lib/python3.10/subprocess.py", line 1154, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib/python3.10/subprocess.py", line 2021, in _communicate
ready = selector.select(timeout)
File "/usr/lib/python3.10/selectors.py", line 416, in select
fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/rl_zoo3/push_to_hub.py", line 398, in <module>
package_to_hub(
File "/usr/local/lib/python3.10/dist-packages/rl_zoo3/push_to_hub.py", line 272, in package_to_hub
repo.push_to_hub(commit_message=commit_message)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/repository.py", line 1325, in push_to_hub
return self.git_push(
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/repository.py", line 1099, in git_push
with _lfs_log_progress():
File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/repository.py", line 421, in _lfs_log_progress
x.join()
File "/usr/lib/python3.10/threading.py", line 1096, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
if lock.acquire(block, timeout):
KeyboardInterrupt
Hi, which version of RL Baselines Zoo did you installed? I'm checking but it should work with the main version.
I'm investigating that today and keep you updated 🤔
Hi indeed it was a problem with the version of RL Zoo I just updated the notebook and you should install instead:
!pip install git+https://github.com/DLR-RM/rl-baselines3-zoo
Thanks again for pointing this out 🤗 I updated it.
Thanks @simoninithomas! In fact, i was using the 2.0.0a9v, but now with the main one it worked