A simple C++ implementation of random search for locomotion tasks using MuJoCo.
rs.cpp
should work with Ubuntu and macOS.
Operating system specific dependencies:
Install Xcode.
Install ninja
:
brew install ninja
sudo apt-get update && sudo apt-get install cmake libgl1-mesa-dev libxinerama-dev libxcursor-dev libxrandr-dev libxi-dev ninja-build clang-12
git clone https://github.com/thowell/rs.cpp
- Change directory:
cd rs.cpp
- Create and change to build directory:
mkdir build
cd build
- Configure:
cmake .. -DCMAKE_BUILD_TYPE:STRING=Release -G Ninja
cmake .. -DCMAKE_BUILD_TYPE:STRING=Release -G Ninja -DCMAKE_C_COMPILER:STRING=clang-12 -DCMAKE_CXX_COMPILER:STRING=clang++-12
- Build
cmake --build . --config=Release
VSCode and 2 of its extensions (CMake Tools and C/C++) can simplify the build process.
- Open the cloned directory
rs.cpp
. - Configure the project with CMake (a pop-up should appear in VSCode)
- Set compiler to
clang-12
. - Build and run the
rs
target in "release" mode (VSCode defaults to "debug").
Train cheetah in ~10 seconds using 20 threads with Intel Core i9-14900K CPU and Ubuntu 22.04.4 LTS (~20 seconds using 10 threads on Apple M1 Pro).
From the build/
directory, run:
./rs --env cheetah --search --visualize --checkpoint cheetah
The saved policy can be visualized:
./rs --env cheetah --load cheetah --visualize
Environments available:
- Ant
- based on ant_v5
- modified solver settings
- only contact between feet and floor
- no rewards or observations dependent on contact forces
- Cheetah
- based on half_cheetah_v5
- modified solver settings
- Humanoid
- based on humanoid_v5
- modified solver settings
- only contact between feet and floor
- no rewards or observations dependent on contact forces
- Walker
- based on walker2d_v5
- modified solver settings
- only contact between feet and floor
Note: run multiple times to find good policies.
Run from build/
:
Search:
./rs --env ant --search
Visualize policy checkpoint:
./rs --env ant --load pretrained/ant_2063_239 --visualize
Search:
./rs --env cheetah --search
Visualize policy checkpoint:
./rs --env cheetah --load pretrained/cheetah_3750_239 --visualize
Search:
./rs --env humanoid --search
Visualize policy checkpoint:
./rs --env humanoid --load pretrained/humanoid_2291_1931 --visualize
Search:
./rs --env walker --search
Visualize policy checkpoint:
./rs --env walker --load pretrained/walker_5619_41 --visualize
Setup:
--env
:ant
,cheetah
,humanoid
,walker
--search
: run random search to improve policy--checkpoint
: filename incheckpoint/
to save policy--load
: provide string incheckpoint/
directory to load policy from checkpoint--visualize
: visualize policy--num_threads
: number of threads / parallel workers
Search settings:
--nsample
: number of random directions to sample--ntop
: number of random directions to use for policy update--niter
: number of policy updates--neval
: number of policy evaluations during search--nhorizon_search
: number of environment steps during policy improvement--nhorizon_eval
: number of environment steps during policy evaluation--random_step
: step size for random direction during policy perturbation--update_step
: step size for policy update during policy improvement--nenveval
: number of environments for policy evaluation--reward_shift
: subtract baseline from per-timestep reward
- This implementation is not deterministic. You may need to run the search multiple times to find a good policy.
- The environments are based on the v5 MuJoCo Gym environments but may not be exact in all details.
- The search settings are based on Simple random search provides a competitive approach to reinforcement learning: Table 9 but may not be exact in all details either.
This repository was developed to:
- understand the Augmented Random Search algorithm
- understand how to compute numerically stable running statistics
- understand the details of Gym environments
- experiment with code generation tools that are useful for improving development times, including: ChatGPT and Claude
MuJoCo models use resources from Gymnasium and dm_control