强化学习基础算法

从零实现DQN、DDPG、AC、PPO、SAC、TD3算法,均能跑通并训练

各种包的版本:

Name                    Version                   Build  Channel

absl-py                   2.0.0                    pypi_0    pypi
ale-py                    0.8.1                    pypi_0    pypi
anyio                     4.0.0                    pypi_0    pypi
astor                     0.8.1                    pypi_0    pypi
asttokens                 2.4.1              pyhd8ed1ab_0    conda-forge
astunparse                1.6.3                    pypi_0    pypi
atari-py                  0.2.6                    pypi_0    pypi
attrs                     23.1.0                   pypi_0    pypi
autorom                   0.4.2                    pypi_0    pypi
autorom-accept-rom-license 0.6.1                    pypi_0    pypi
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
bleach                    6.1.0              pyhd8ed1ab_0    conda-forge
blinker                   1.6.3                    pypi_0    pypi
box2d-py                  2.3.5                    pypi_0    pypi
ca-certificates           2024.2.2             h56e8100_0    conda-forge
cached-property           1.5.2                    pypi_0    pypi
cachetools                4.2.4                    pypi_0    pypi
cattrs                    1.5.0                    pypi_0    pypi
certifi                   2023.7.22                pypi_0    pypi
charset-normalizer        3.3.0                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cloudpickle               1.6.0                    pypi_0    pypi
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
comm                      0.2.2              pyhd8ed1ab_0    conda-forge
contourpy                 1.1.1                    pypi_0    pypi
cycler                    0.12.0                   pypi_0    pypi
d2l                       1.0.3                    pypi_0    pypi
debugpy                   1.6.7            py38hd77b12b_0    defaults
decorator                 5.1.1              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
dill                      0.3.8              pyhd8ed1ab_0    conda-forge
entrypoints               0.4                pyhd8ed1ab_0    conda-forge
etils                     1.3.0                    pypi_0    pypi
exceptiongroup            1.1.3                    pypi_0    pypi
executing                 2.0.1              pyhd8ed1ab_0    conda-forge
filelock                  3.13.1                   pypi_0    pypi
flask                     3.0.0                    pypi_0    pypi
flask-cors                4.0.0                    pypi_0    pypi
flatbuffers               23.5.26                  pypi_0    pypi
fonttools                 4.43.0                   pypi_0    pypi
funcsigs                  1.0.2                    pypi_0    pypi
future                    0.18.3                   pypi_0    pypi
gast                      0.3.3                    pypi_0    pypi
glfw                      2.7.0                    pypi_0    pypi
google-auth               2.23.4                   pypi_0    pypi
google-auth-oauthlib      1.0.0                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.59.3                   pypi_0    pypi
gym                       0.19.0                   pypi_0    pypi
gym-notices               0.0.8                    pypi_0    pypi
h11                       0.14.0                   pypi_0    pypi
h5py                      3.9.0                    pypi_0    pypi
httpcore                  0.18.0                   pypi_0    pypi
httpx                     0.25.0                   pypi_0    pypi
idna                      3.4                      pypi_0    pypi
imageio                   2.32.0                   pypi_0    pypi
importlib-metadata        6.8.0                    pypi_0    pypi
importlib-resources       6.1.0                    pypi_0    pypi
importlib_resources       6.4.0              pyhd8ed1ab_0    conda-forge
intel-openmp              2023.2.0         h57928b3_50496    conda-forge
ipykernel                 6.21.1                   pypi_0    pypi
ipython                   8.12.0             pyh08f2357_0    conda-forge
ipython_genutils          0.2.0              pyhd8ed1ab_1    conda-forge
ipywidgets                8.1.2                    pypi_0    pypi
itsdangerous              2.1.2                    pypi_0    pypi
jedi                      0.19.1             pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2                    pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
jsonschema                4.22.0             pyhd8ed1ab_0    conda-forge
jsonschema-specifications 2023.12.1          pyhd8ed1ab_0    conda-forge
jupyter                   1.0.0                    pypi_0    pypi
jupyter-client            7.2.0                    pypi_0    pypi
jupyter-console           6.6.3                    pypi_0    pypi
jupyter_core              5.7.2            py38haa244fe_0    conda-forge
jupyterlab-widgets        3.0.10                   pypi_0    pypi
keras                     2.13.1                   pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
lasagne                   0.1                      pypi_0    pypi
libblas                   3.9.0                     8_mkl    conda-forge
libcblas                  3.9.0                     8_mkl    conda-forge
libclang                  16.0.6                   pypi_0    pypi
libffi                    3.4.4                hd77b12b_0    defaults
libgpuarray               0.7.6             h8ffe710_1003    conda-forge
liblapack                 3.9.0                     8_mkl    conda-forge
libsodium                 1.0.18               h8d14728_1    conda-forge
m2w64-gcc-libgfortran     5.3.0                         6    conda-forge
m2w64-gcc-libs            5.3.0                         7    conda-forge
m2w64-gcc-libs-core       5.3.0                         7    conda-forge
m2w64-gmp                 6.1.0                         2    conda-forge
m2w64-libwinpthread-git   5.0.0.4634.697f757               2    conda-forge
mako                      1.3.2              pyhd8ed1ab_0    conda-forge
markdown                  3.5                      pypi_0    pypi
markupsafe                2.1.3                    pypi_0    pypi
matplotlib                3.7.2                    pypi_0    pypi
matplotlib-inline         0.1.6                    pypi_0    pypi
mistune                   0.8.4                    pypi_0    pypi
mkl                       2020.4             hb70f87d_311    conda-forge
mlagents                  0.30.0                   pypi_0    pypi
mlagents-envs             0.28.0                   pypi_0    pypi
msys2-conda-epoch         20160418                      1    conda-forge
mujoco                    3.1.3                    pypi_0    pypi
nb_conda                  2.2.1                     win_7    conda-forge
nb_conda_kernels          2.3.1            py38haa244fe_2    conda-forge
nbconvert                 5.5.0                      py_0    defaults
nbformat                  5.10.4             pyhd8ed1ab_0    conda-forge
nest-asyncio              1.6.0              pyhd8ed1ab_0    conda-forge
nltk                      3.8.1                    pypi_0    pypi
notebook                  5.7.11           py38haa244fe_0    conda-forge
numpy                     1.23.5                   pypi_0    pypi
oauthlib                  3.2.2                    pypi_0    pypi
objgraph                  3.6.0                    pypi_0    pypi
opencv-python             4.2.0.32                 pypi_0    pypi
openssl                   3.0.13               h2bbff1b_2    defaults
opt-einsum                3.3.0                    pypi_0    pypi
packaging                 23.2                     pypi_0    pypi
paddle-bfloat             0.1.7                    pypi_0    pypi
paddlepaddle              2.5.1                    pypi_0    pypi
pandas                    2.0.3                    pypi_0    pypi
pandoc                    3.2                  h57928b3_0    conda-forge
pandocfilters             1.5.0              pyhd8ed1ab_0    conda-forge
parl                      2.2.1                    pypi_0    pypi
parso                     0.8.4              pyhd8ed1ab_0    conda-forge
path                      16.10.0                  pypi_0    pypi
pathlib                   1.0.1                    pypi_0    pypi
pettingzoo                1.15.0                   pypi_0    pypi
pfrl                      0.4.0                    pypi_0    pypi
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    10.0.1                   pypi_0    pypi
pip                       24.0                     pypi_0    pypi
pkgutil-resolve-name      1.3.10             pyhd8ed1ab_1    conda-forge
platformdirs              4.2.2              pyhd8ed1ab_0    conda-forge
prettytable               3.9.0                    pypi_0    pypi
prometheus_client         0.20.0             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.42             pyha770c72_0    conda-forge
prompt_toolkit            3.0.42               hd8ed1ab_0    conda-forge
protobuf                  3.20.0                   pypi_0    pypi
psutil                    5.9.6                    pypi_0    pypi
pure_eval                 0.2.2              pyhd8ed1ab_0    conda-forge
pyasn1                    0.5.0                    pypi_0    pypi
pyasn1-modules            0.3.0                    pypi_0    pypi
pyfunctional              1.4.3              pyhd8ed1ab_0    conda-forge
pygame                    2.1.0                    pypi_0    pypi
pyglet                    1.5.0                    pypi_0    pypi
pygments                  2.18.0             pyhd8ed1ab_0    conda-forge
pygpu                     0.7.6           py38h6f4d8f0_1003    conda-forge
pynput                    1.7.6                    pypi_0    pypi
pynvml                    11.5.0                   pypi_0    pypi
pyopengl                  3.1.7                    pypi_0    pypi
pyparsing                 3.0.9                    pypi_0    pypi
pypiwin32                 223                      pypi_0    pypi
python                    3.8.18               h1aa4202_0    defaults
python-dateutil           2.8.2                    pypi_0    pypi
python-fastjsonschema     2.19.1             pyhd8ed1ab_0    conda-forge
python-graphviz           0.20.1                   pypi_0    pypi
python_abi                3.8                      2_cp38    conda-forge
pytz                      2023.3.post1             pypi_0    pypi
pywin32                   306                      pypi_0    pypi
pywinpty                  2.0.10           py38h5da7b33_0    defaults
pyyaml                    6.0.1                    pypi_0    pypi
pyzmq                     18.1.1                   pypi_0    pypi
qtconsole                 5.5.2                    pypi_0    pypi
qtpy                      2.4.1                    pypi_0    pypi
rarfile                   4.1                      pypi_0    pypi
referencing               0.35.1             pyhd8ed1ab_0    conda-forge
regex                     2023.10.3                pypi_0    pypi
requests                  2.31.0                   pypi_0    pypi
requests-oauthlib         1.3.1                    pypi_0    pypi
rpds-py                   0.10.6           py38h062c2fa_0    defaults
rsa                       4.9                      pypi_0    pypi
scipy                     1.10.1                   pypi_0    pypi
seaborn                   0.13.0                   pypi_0    pypi
send2trash                1.8.3              pyh5737063_0    conda-forge
setuptools                57.5.0                   pypi_0    pypi
six                       1.16.0             pyh6c4a22f_0    conda-forge
sniffio                   1.3.0                    pypi_0    pypi
sqlite                    3.41.2               h2bbff1b_0    defaults
stack_data                0.6.2              pyhd8ed1ab_0    conda-forge
swig                      4.1.1.post0              pypi_0    pypi
tabulate                  0.9.0              pyhd8ed1ab_1    conda-forge
tensorboard               2.14.0                   pypi_0    pypi
tensorboard-data-server   0.7.2                    pypi_0    pypi
tensorboard-plugin-wit    1.8.1                    pypi_0    pypi
tensorboardx              2.5                      pypi_0    pypi
termcolor                 2.3.0                    pypi_0    pypi
terminado                 0.18.1             pyh5737063_0    conda-forge
testpath                  0.6.0              pyhd8ed1ab_0    conda-forge
theano                    1.0.5            py38h885f38d_3    conda-forge
torch                     1.13.1+cu117             pypi_0    pypi
torchaudio                0.13.1+cu117             pypi_0    pypi
torchvision               0.14.1+cu117             pypi_0    pypi
tornado                   6.2              py38h294d835_0    conda-forge
tqdm                      4.66.1                   pypi_0    pypi
traitlets                 5.7.1            py38haa95532_0    defaults
typing-extensions         4.5.0                    pypi_0    pypi
typing_extensions         4.11.0             pyha770c72_0    conda-forge
tzdata                    2023.3                   pypi_0    pypi
urllib3                   2.0.6                    pypi_0    pypi
vc                        14.2                 h21ff451_1    defaults
vs2015_runtime            14.27.29016          h5e58377_2    defaults
vs2017_win-64             19.16.27033         hddac466_18    conda-forge
vswhere                   3.1.4                h57928b3_0    conda-forge
wcwidth                   0.2.8                    pypi_0    pypi
webencodings              0.5.1              pyhd8ed1ab_2    conda-forge
werkzeug                  3.0.0                    pypi_0    pypi
wheel                     0.41.2           py38haa95532_0    defaults
widgetsnbextension        4.0.10                   pypi_0    pypi
winpty                    0.4.3                         4    conda-forge
wrapt                     1.15.0                   pypi_0    pypi
zeromq                    4.3.5                hd77b12b_0    defaults
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge

如果跑不通的话可以重点看一下gym的版本和torch的版本