HandyRL is a handy and simple framework for distributed reinforcement learning that is applicable to your own environments.
- Prepare your own environment
- Let’s start large-scale distributed training
- Get your great model!
HandyRL supports Python3.7+. You need to install additional libraries (e.g. numpy, pytorch).
pip3 install -r requirements.txt
To use games of kaggle environments (e.g. Hungry Geese) you can install also additional dependencies.
pip3 install -r handyrl/envs/kaggle/requirements.txt
Set config.yaml
for your training configuration.
If you run a training with TicTacToe and batch size 64, set like the following:
env_args:
env: 'TicTacToe'
source: 'handyrl.envs.tictactoe'
train_args:
...
batch_size: 64
...
NOTE: TicTacToe is used as a default game. Here is the list of games. When you use your own environment, set the name of the environment to env
and script path to source
.
python main.py --train
NOTE: Trained model is saved in models
folder.
After training, you can evaluate the model against any models. The below code runs the evaluation for 100 games with 4 processes.
python main.py --eval models/1.pth 100 4
NOTE: Default opponent AI is random agent implemented in evaluation.py
. You can change the agent with any of your agents.
HandyRL allows you to learn a model remotely on a large scale.
If you will use remote machines as worker clients(actors), you need to set training server(learner) address in each client:
worker_args:
server_address: '127.0.0.1' # Set training server address to be connected from worker
...
NOTE: When you train a model on cloud(e.g. GCP, AWS), the internal/external IP of virtual machine can be set here.
python main.py --train-server
NOTE: The server listens to connections from workers. The trained models are saved in models
folder.
After starting the training server, you can start the workers for data generation and evaluation. In HandyRL, (multi-node) multiple workers can connect to the server.
python main.py --worker
After training, you can evaluate the model against any models. The below code runs the evaluation for 100 games with 4 processes.
python main.py --eval models/1.pth 100 4
Write a wrapper class named Environment
following the format of the sample environments.
The kind of your games are:
- turn-based game: see tictactoe.py, geister.py
- simultaneous game: see hungry_geese.py
To see all interfaces of environment, check environment.py.
- How to use rule-based AI as an opponent?
- You can easily use it by creating a rule-based AI method
rule_based_action()
in a classEnvironment
.
- You can easily use it by creating a rule-based AI method
- How to change the opponent in evaluation?
- Set your agent in
evaluation.py
likeagents = [agent1, YourOpponentAgent()]
- Set your agent in
too many open files
Error- This error happens in a large-scale training. You should increase the maximum file limit by running
ulimit -n 65536
. The value 65536 depends on a training setting. Note that the effect ofulimit
is session-based so you will have to either change the limit permanently (OS and version dependent) or run this command in your shell starting script. - In Mac OSX, you may need to change the system limit with
launchctl
before runningulimit -n
(e.g. How to Change Open Files Limit on OS X and macOS)
- This error happens in a large-scale training. You should increase the maximum file limit by running