/rs.cpp

A multi-threaded C++ implementation of random search for locomotion tasks using MuJoCo.

Primary LanguageC++MIT LicenseMIT

Random Search

A simple C++ implementation of random search for locomotion tasks using MuJoCo.

Installation

rs.cpp should work with Ubuntu and macOS.

Prerequisites

Operating system specific dependencies:

macOS

Install Xcode.

Install ninja:

brew install ninja

Ubuntu

sudo apt-get update && sudo apt-get install cmake libgl1-mesa-dev libxinerama-dev libxcursor-dev libxrandr-dev libxi-dev ninja-build clang-12

Clone rs.cpp

git clone https://github.com/thowell/rs.cpp

Build and Run

  1. Change directory:
cd rs.cpp
  1. Create and change to build directory:
mkdir build
cd build
  1. Configure:

macOS

cmake .. -DCMAKE_BUILD_TYPE:STRING=Release -G Ninja

Ubuntu

cmake .. -DCMAKE_BUILD_TYPE:STRING=Release -G Ninja -DCMAKE_C_COMPILER:STRING=clang-12 -DCMAKE_CXX_COMPILER:STRING=clang++-12
  1. Build
cmake --build . --config=Release

Build and Run rs.cpp using VSCode

VSCode and 2 of its extensions (CMake Tools and C/C++) can simplify the build process.

  1. Open the cloned directory rs.cpp.
  2. Configure the project with CMake (a pop-up should appear in VSCode)
  3. Set compiler to clang-12.
  4. Build and run the rs target in "release" mode (VSCode defaults to "debug").

Train cheetah

Train cheetah in ~10 seconds using 20 threads with Intel Core i9-14900K CPU and Ubuntu 22.04.4 LTS (~20 seconds using 10 threads on Apple M1 Pro).

drawing

From the build/ directory, run:

./rs --env cheetah --search --visualize --checkpoint cheetah

The saved policy can be visualized:

./rs --env cheetah --load cheetah --visualize

Environments

Environments available:

  • Ant
    • based on ant_v5
    • modified solver settings
    • only contact between feet and floor
    • no rewards or observations dependent on contact forces
  • Cheetah
  • Humanoid
    • based on humanoid_v5
    • modified solver settings
    • only contact between feet and floor
    • no rewards or observations dependent on contact forces
  • Walker
    • based on walker2d_v5
    • modified solver settings
    • only contact between feet and floor

Usage

Note: run multiple times to find good policies.

Run from build/:

Ant

drawing

Search:

./rs --env ant --search

Visualize policy checkpoint:

./rs --env ant --load pretrained/ant_2063_239 --visualize

Cheetah

drawing

Search:

./rs --env cheetah --search

Visualize policy checkpoint:

./rs --env cheetah --load pretrained/cheetah_3750_239 --visualize

Humanoid

drawing

Search:

./rs --env humanoid --search

Visualize policy checkpoint:

./rs --env humanoid --load pretrained/humanoid_2291_1931 --visualize

Walker

drawing

Search:

./rs --env walker --search

Visualize policy checkpoint:

./rs --env walker --load pretrained/walker_5619_41 --visualize

Command-line

Setup:

  • --env: ant, cheetah, humanoid, walker
  • --search: run random search to improve policy
  • --checkpoint: filename in checkpoint/ to save policy
  • --load: provide string in checkpoint/ directory to load policy from checkpoint
  • --visualize: visualize policy
  • --num_threads: number of threads / parallel workers

Search settings:

  • --nsample: number of random directions to sample
  • --ntop: number of random directions to use for policy update
  • --niter: number of policy updates
  • --neval: number of policy evaluations during search
  • --nhorizon_search: number of environment steps during policy improvement
  • --nhorizon_eval: number of environment steps during policy evaluation
  • --random_step: step size for random direction during policy perturbation
  • --update_step: step size for policy update during policy improvement
  • --nenveval: number of environments for policy evaluation
  • --reward_shift: subtract baseline from per-timestep reward

Notes

This repository was developed to:

  • understand the Augmented Random Search algorithm
  • understand how to compute numerically stable running statistics
  • understand the details of Gym environments
  • experiment with code generation tools that are useful for improving development times, including: ChatGPT and Claude

MuJoCo models use resources from Gymnasium and dm_control