A repo for running deepracer locally. The rl_coach code comes from https://github.com/awslabs/amazon-sagemaker-examples/tree/master/reinforcement_learning/rl_deepracer_robomaker_coach_gazebo
The DeepRacer console undlying bundle can update with no warning so this won't always be up to date with the console.
If you can't get this working, please open an issue. It helps with me being able to see issues I might need to fix and it helps everyone else see fixes from issues they might be having. There is a FAQ in the wiki for common issues.
For additonal help with OSX setup, please refer to a supplimental guide provided by joezen777 in issue #11.
I have been able to improve this process so it's easy for everyone to use. What you will need to run this is:
- Docker
- Python3
- Minio the S3 emulator
- Preferablly a Linux host as Docker works a lot better there
- A copy of this repo
You may not need to do all these steps as they pertain to general setup of the host.
- Ensure you have root access to docker through the docker group. See Post installation steps for docker
- Please post an issue if you get issues cloning the repo, make sure to use
git clone --recurse-submodules https://github.com/crr0004/deepracer.git
to get them all. You will get an error about benchmarks missing insagemaker-tensorflow-container
, you can safely ignore it. - You may get firewall issues with the docker containers trying to access the minio running outside the sagemaker-local network. You will see errors about no route to from the containers. For fixing this, you can either disable your firewall or allow the docker adapters as trusted adapters.
- More notes to come, if you want anything added here, open an issue please.
- Minio
- Robomaker
- Sagemaker
Download the binary from Minio and put it somewhere you're okay with having large files.
Then run source rl_coach\env.sh
to get some reasonable defaults for your environemnt. Then run ./minio server data
to create a folder data.
You will need to create a bucket named bucket
through the web GUI that minio provides, just open http://127.0.0.1:9000 in your browser.
Then copy the folder custom_files into your new bucket as that's where the defaults expect them to be.
You should source that env.sh
for every terminal you open when interacting with the deepracer instances because it helps keep everything consistent.
I suggest you cat rl_coach\env.sh
to see what is being set.
I'd suggest you make a python virtual enviornment for this as it will install a fair bit, and with older versions of packages.
To create a virtual environment you can run python3 -m venv sagemaker_venv
to create the virtual environment in the directory sagemaker_venv. To activate the venv, run source sagemaker_venv/bin/activate
on linux.
To install sagemaker run pip install -U sagemaker-python-sdk/ awscli pandas
.
Now you need to get the docker images that sagemaker is expecting. Run docker pull crr0004/sagemaker-rl-tensorflow:console
. I have fixed the python script so it uses this image directly now, no more tagging needed.
You will need to copy the config.yaml
file to ~/.sagemaker
to configure
where the temp directories for the sagemaker docker containers are put. I
suggest you edit it to where you want. It is relative to where you run
rl_deepracer_coach_robomaker.py
from. So make sure to check that folder exists, or change the contents of ~/.sagemaker/config.yaml
to something that does exist. I have it set to a folder a couple directories up.
E.G mkdir -p ~/.sagemaker && cp config.yaml ~/.sagemaker
.
To set some extra environment variables in Sagemaker SDK, until I figure out a
better way, set the environemnt variable LOCAL_ENV_VAR_JSON_PATH
to a
env_vars.json
. E.G export LOCAL_ENV_VAR_JSON_PATH=$(readlink -f ./env_vars.json).
Now you can run (cd rl_coach; python rl_deepracer_coach_robomaker.py)
to start sagemaker.
You can change the image name in rl_deepracer_coach_robomaker.py
to your respective GPU type and do the setup needed for each type, see each section.
You can change the image name in rl_deepracer_coach_robomaker.py
to "crr0004/sagemaker-rl-tensorflow:nvidia" to use GPU accerlation. You will also need to setup docker to use the GPU by following https://github.com/NVIDIA/nvidia-docker.
In this file update instance_type
to local_gpu
to run sagemaker in nvidia runtime.
You can also uncommment the line in env.sh
that is #export LOCAL_EXTRA_DOCKER_COMPOSE_PATH=$(readlink -f ./docker_compose_extra.json)
to
cause privilaged to be passed to the docker compose command. You can also use
the file docker_compose_extra.json
to modify the docker compose file that is
used to launch the sagemaker container.
You can change the image name in rl_deepracer_coach_robomaker.py
to "crr0004/sagemaker-rl-tensorflow:amd".
You will need to install ROCm and then ensure there is the kfd device on your system.
If you're using an upstream kernel, there is a section in the ROCm Readme to enable a kfd device.
Then you will need to uncomment the line in env.sh
to to enable LOCAL_EXTRA_DOCKER_COMPOSE
and add
"devices": [
"/dev/kfd:/dev/kfd",
"/dev/dri:/dev/dri"
]
to docker_compose_extra.json
Firstly to get the images I have built, run docker pull crr0004/deepracer_robomaker:console
, no need to alter the tag unless you want to. This image are built from docker/Robomaker-kinetic-debug.docker
, and the crr0004/deepracer_robomaker:1.0b
is built from docker/Robomaker-kinetic.docker
but shouldn't need to use those docker files unless you want to build it from scratch or do it without docker.
You will need to alter the robomaker.env
file to change the WORLD_NAME
to the track you want, and anything else.
You can run the docker image with docker run --rm --name dr --env-file ./robomaker.env --network sagemaker-local -p 8080:5900 -it crr0004/deepracer_robomaker:console
If you want an advanced startup that I generally use to see everything you can
use docker run --rm --name dr --env-file ./robomaker.env --network sagemaker-local -p 8080:5900 -v $(pwd)/aws-robomaker-sample-application-deepracer/simulation_ws/src:/app/robomaker-deepracer/simulation_ws/src -v $(readlink -f ../robo/checkpoint):/root/.ros/ -it crr0004/deepracer_robomaker "./run.sh build distributed_training.launch"
.
This
command mounts all the directories to local directories so you can see all the
files. You can replace the "./run.sh"
part to bash
and you will get a
shell in the container.
You can run vncviewer localhost:8080
to get a VNC view of the running container.
You now specify your action space in the json file you pass in through
MODEL_METADATA_FILE_S3_KEY
, which is defaulted to
bucket/custom_files/model_metadata.json
WARNING THIS SCRIPT MODIFIES FILES IN YOUR AWS S3 BUCKET (DELETES AND UPLOADS). WHILE THE AUTHORS HAVE TAKEN CARE TO NOT MAKE IT HARMFUL, THEY TAKE NO RESPONSIBILITY FOR ANY DAMAGES IT MAY CAUSE, ESPECIALLY (BUT NOT ONLY) IF MISCONFIGURED. USE AT YOUR OWN RISK.
This script's aim is to make it easier to create snapshots of models to upload for submission. It has been written to work in sagemaker_venv so make sure to activate it before using. It also assumes that you have awscli installed and configured with access to DeepRacer's S3 bucket.
To use it call it first: ./dr_util.py init
, then set values in created dr_util_config.json
. It has been added to .gitignore.
To learn more about usage, run ./dr_util.py -h