This repository contains an adaptation of the Yawning Titan cyber simulator, as released by DSTL, to the Entity Gym interface. This adaptation enables entity-based reinforcement learning, which is designed to improve the generalisability of reinforcement learning agents across varying network topologies. Developed at the AI for Cyberdefence Research Centre (AICD).
The repository provides training and evaluation scripts that can reproduce the experiments described in the workshop paper Entity-based Reinforcement Learning for Autonomous Cyber Defence. This paper introduces and motivates the use of entity-based reinforcement learning in the context of autonomous cyber defence.
The training process leverages the Entity-Neural-Net Trainer package and its default Proximal Policy Optimization (PPO) implementation. It also uses the RogueNet Transformer policy, specifically designed for Entity Gym environments, applied to a modified entity-based Yawning Titan environment.
For comparison, baseline scripts utilize the Stable Baselines 3 PPO trainer with Multilayer Perceptron policy parameterisation. These baselines are trained on an equivalent Yawning Titan environment that retains the OpenAI Gym interface.
If you use this repository in your research, please consider citing our companion paper:
@article{SymesThompson2024EntityACD,
title={Entity-based Reinforcement Learning for Autonomous Cyber Defence},
author={Symes Thompson, Isaac and Caron, Alberto and Hicks, Chris and Mavroudis, Vasilios},
journal={arXiv preprint arXiv:2410.17647v2},
year={2024}
}
Please also cite the original Yawning Titan work, with citation specified in the repository here. If using the Entity Gym package, consider citing their repository or the original blog post by the authors, as we do in the paper.
This repository contains a Dockerfile and docker-compose.yaml file for use with Docker compose. This might be the simplest way to get the scripts working.
- First, clone the repository, then navigate to the root directory.
- Make sure Docker is running, and then depending on your Docker/ Docker Compose version. Run:
docker-compose run --rm entity-yt
Or:
docker compose run --rm -it entity-yt
This should build and open terminal access to a docker container, with a volume mounted to the project directory, with all relevant packages installed.
NB: A dependency of the Entity-Neural-Net Trainer is the pyron library, for which wheels are provided only for x86 architectures. Therefore it is necessary to use emulation if running on non-x86 (e.g Arm Apple Silicon). This is specified on the docker-compose file.
This is tested with python 3.9 on Ubuntu 22.04, CPU only. With models of default sizes, initial experiments with GPU acceleration were found to hamper performance.
Installing Gym 0.21.0 using pip requires specific versions of setuptools and wheel, otherwise installation is likely to fail.
pip install setuptools==66 wheel==0.38.4
pip install gym==0.21.0
Install typing_extensions
, hyperstate
, and enn_trainer
with the following command:
pip install typing_extensions hyperstate enn_trainer
For a CPU-only version of PyTorch, run:
pip install torch==1.12.0+cpu --extra-index-url https://download.pytorch.org/whl/cpu
pip install --no-cache-dir --no-index torch-scatter -f https://data.pyg.org/whl/torch-1.12.0+cpu.html
pip install stable_baselines3==1.6.2 wandb platformdirs networkx tinydb tabulate
Make sure you are in the parent directory, and installing the Yawning Titan version present in this repository.
pip install -e yawning_titan
pip install -e .
Logging with Weights & Biases is supported. When training an entity-based agent, this involves modifying the training config file (in src/ebg/configs/yawning_titan/train_config/EntitYT ... .ron
) to add a wandb user and project, and enabling tracking. For example:
TrainConfig(
...
track: true,
wandb_project_name: "EntityYT"
wandb_entity: "John_Smith"
...
)
For logging the training of the Stable Baselines 3 agents, you must modify the arguments of the wandb.init
inside the Sb3_PPO_YT_train.py
script:
wandb.init(
...
project='EntityYT',
entity='John_Smith',
sync_tensorboard=True,
)
For logging the evaluation of either kind of agent, you must modify the wandb.init
call in the relevant evaluation script.
eval_config.capture_videos = False
wandb.init(
project='EntityYT',
entity='John_Smith',
...
)
The EntityYT
environment class inside src/ebg/envs/entity_cyber/entity_yawning_titan_generic.py
contains the Entity-based version of the Yawning Titan environment. It is derived from the GenericNetworkEnv
environment, which can be found in yawning_titan/src/yawning_titan/envs/generic/generic_env.py
for reference.
Broadly speaking, the main difference between an 'entity-based' approach and the approach used in the Yawning Titan environment is that the observation and action spaces are treated as collections of distinct objects or entities (nodes), whereas in the default Gym environment all features are concatenated into a unified observation vector.
For example, the observation space is defined as:
def obs_space(self) -> ObsSpace:
return ObsSpace(
entities={
'Generic_Node': Entity(features=self.feature_list),
'Defender': Entity(features=[]),
}
)
This specifies the different possible 'entity types' in the environment, as well as a list of named features that entities of that type might have. In our case, we have a single generic entity type for all the nodes in the environment, with the feature list built from all those that are enabled in the blue agent observation space in the game mode config file. We also have a dummy 'Defender' entity type with no specified features - this is to allow for a composite action space where the blue agent first chooses an action type, and then a node to act on. As shown:
def action_space(self) -> Dict[ActionName, ActionSpace]:
return {
'High_Level': GlobalCategoricalActionSpace(self.action_list),
'Target': SelectEntityActionSpace(),
}
Here a 'GlobalCategoricalActionSpace' is defined, containing all of the types of actions the blue agent may perform. This list is built based on the game mode configuration file. This is similar to the standard discrete action space found in Gym. A 'SelectEntityActionSpace' is also specified, which is used for selecting a particular node to execute the action that has been selected by the 'Global' action space.
Note that both the observation and action space provide only the types of possible observations and actions, with the specific compositions potentially varying with each timestep or between episodes. This allows for the initialisation of a policy network with the environment whilst still being flexible to environment variation.
The actual observations and actions are made available to an agent in an Observation
object at a particular timestep, upon calling the observe
function.
def observe(self, done=False, reward=0) -> Observation:
return Observation(
entities={
'Generic_Node': (
self.get_node_features(),
[('Generic_Node', i) for i in range(len(self.nodes))],
),
'Defender': (np.zeros([1, 0], dtype=np.float32), [('Defender', 0)]),
},
actions={
'High_Level': GlobalCategoricalActionMask(),
'Target': SelectEntityActionMask(
actor_types=['Defender'], actee_types=['Generic_Node']
),
},
done=done,
reward=reward,
)
Here, a potentially variable length list of node features is provided to the agent, returned by the get_node_features
function. The length of this list depends on the number of nodes in the environment. The defender entity is given a dummy feature vector.
The actions available to the agent are defined by action masks over the previously specified action spaces. There is no action mask on the high level action type space. The action mask on the SelectEntityActionSpace determines that the 'Defender' entity is to act on the 'Generic_Node' entities. This prompts the policy network to use information in the embedding of the Defender entity to decide which node to execute the defensive action on. If there were multiple Defender entities then an action would be chosen for each Defender, since the mask is based on entity types rather than specific entities.
All scripts are present in the src/ebg/train_eval/
directory.
This script facilitates training using the EntityYT
entity-based yawning titan environment. It trains a version of PPO
- "--n-nodes" Specifies the number of nodes to be used in the Yawning Titan network environment. Default 10
- "--edge-prob" Specifies the edge probability parameter when generating the network (using Erdős-Renyi model). Default 0.1
- "--episode-length" Specifies the length of each episode. Default 100
- "--red-skill" Specifies the 'skill' of the red agent in Yawning Titan. Between 0.0 and 1.0. Default 0.7
- "--random" Specifies whether the network environment is re-generated between episode resets. If absent then the same network is used throughout the training run.
- "--render" Specifies whether the environment is rendered on logging and evaluation.
- "--seed" Random seed, default 0
- "--total-timesteps" Total number of environment timesteps to run the training for.
- "--eval-freq" Interval (in environment timesteps) between evaluations and logging of the policy.
- "--game-mode" Specifies which Yawning Titan gamemode config file to use in the game_modes config directory. Default "fixed_episode_base.json"
- "--config" Specifies the training configuration file, in the train_config directory. "EntityYT_base.ron" by default.
- "--checkpoint-dir" Specifies a directory to save checkpoints. By default this is generated automatically from other arguments and placed in a "checkpoints" parent directory.
python EntityYT_train.py --n-nodes 10 --edge-prob 0.1 --episode-length 100 \
--total-timesteps 1000000 --eval-freq 10000 --random --config EntityYT_base.ron
Evaluates a trained agent in the EntityYT
environment, given a checkpoint.
- "--n-nodes" Specifies the number of nodes to be used in the evaluation environment. Default 10
- "--edge-prob" Specifies the edge probability parameter when generating networks (using Erdős-Renyi model). Default 0.1
- "--episode-length" Specifies the length of each episode. Default 100
- "--red-skill" Specifies the 'skill' of the red agent in Yawning Titan. Between 0.0 and 1.0. Default 0.7
- "--random" Specifies whether the network environment is re-generated between episode resets. If absent then the same network is used throughout evaluation.
- "--render" Specifies whether the environment is rendered during evaluation.
- "--seed" Randoom seed, default 0
- "--game-mode" Specifies which Yawning Titan gamemode config file to use in the game_modes config directory. Default "fixed_episode_base.json"
- "--num-evals" Specifies the number of episodes to evaluate the agent over, default 30. In the paper, evaluations over 1000 episodes are used.
- "--checkpoint-dir" Specifies a directory to load checkpoints from. By default, the most recent checkpoint beginning with "EntityYT" in the checkpoints directory will be loaded.
python EntityYT_evaluate.py --n-nodes 10 --edge-prob 0.1 --episode-length 100 --num-evals 1000
This script trains an agent using Stable-Baselines3
PPO in an OpenAI Gym version of the Yawning Titan environment. This version uses fully 'concatenated' observation and action spaces, and the policy is parameterised using a multilayer perceptron.
This script has the same arguments as the Entity-based Yawning Titan training script, apart from the rendering and logging function, which proved harder to implement correctly.
- "--n-nodes" Specifies the number of nodes to be used in the Yawning Titan network environment. Default 10
- "--edge-prob" Specifies the edge probability parameter when generating the network (using Erdős-Renyi model). Default 0.1
- "--episode-length" Specifies the length of each episode. Default 100
- "--red-skill" Specifies the 'skill' of the red agent in Yawning Titan. Between 0.0 and 1.0. Default 0.7
- "--random" Specifies whether the network environment is re-generated between episode resets. If absent then the same network is used throughout the training run.
- "--seed" Random seed, default 0
- "--total-timesteps" Total number of environment timesteps to run the training for.
- "--eval-freq" Interval (in environment timesteps) between evaluations and logging of the policy.
- "--game-mode" Specifies which Yawning Titan gamemode config file to use in the game_modes config directory. Default "fixed_episode_base.json"
- "--config" Specifies the training configuration file, in the train_config directory. "Sb3_PPO_base.ron" by default.
- "--checkpoint-dir" Specifies a directory to save checkpoints. By default this is generated automatically from other arguments and placed in a "checkpoints" parent directory.
python Sb3_PPO_YT_train.py --n-nodes 10 --edge-prob 0.1 --episode-length 100 \
--total-timesteps 1000000 --eval-freq 10000 --random --config Sb3_PPO_base.ron
Evaluates a checkpointed Stable Baselines 3 PPO agent on a Yawning Titan environment. Uses the same argumnts as the Entity-based Yawning Titan evaluation script.
- "--n-nodes" Specifies the number of nodes to be used in the evaluation environment. Default 10
- "--edge-prob" Specifies the edge probability parameter when generating networks (using Erdős-Renyi model). Default 0.1
- "--episode-length" Specifies the length of each episode. Default 100
- "--red-skill" Specifies the 'skill' of the red agent in Yawning Titan. Between 0.0 and 1.0. Default 0.7
- "--random" Specifies whether the network environment is re-generated between episode resets. If absent then the same network is used throughout evaluation.
- "--render" Specifies whether the environment is rendered during evaluation.
- "--seed" Randoom seed, default 0
- "--game-mode" Specifies which Yawning Titan gamemode config file to use in the game_modes config directory. Default "fixed_episode_base.json"
- "--num-evals" Specifies the number of episodes to evaluate the agent over, default 30. In the paper, evaluations over 1000 episodes are used.
- "--checkpoint-dir" Specifies a directory to load checkpoints from. By default, the most recent checkpoint beginning with "EntityYT" in the checkpoints directory will be loaded.
python Sb3_PPO_YT_evaluate.py --n-nodes 10 --edge-prob 0.1 --episode-length 100 \
--checkpoint-dir checkpoints/Sb3_PPO_YT_checkpoint_dir --num-evals 30
Game modes are defined in JSON files located in configs/yawning_titan/game_modes/
. Modify these files to adjust environment parameters, blue and red capabilities, and the reward function used by the environment.
🚨 ! Important: Currently the Entity-based Yawning Titan environment does not support all of the Game mode options provided in Yawning Titan. So experimentation outside of the provided configuration may be unreliable. In particular, episodes with early termination conditions, and Deceptive node functionality are not supported.
The arguments to the training scripts 'episode-length', 'red-skill' override the values in the game_mode config, to allow for quicker experimentation
Training settings are specified in .ron
files within configs/yawning_titan/train_config/
.
- The EntityYT ... .ron configs follow the structure provided in the Entity Neural Network project. They specify the configuration of the Roguenet Transformer policy, the optimiser and PPO hyperparameters, the evaluation configuration as well as higher level options such as wandb settings and overall training length.
EntityYT_base.ron
contains the parameters used for the experiments in the paper, which are largely the defaults for the Entity-Neural-Network Trainer package.EntityYT_alt.ron
contains PPO and optimiser parameters closer to the Stable Baselines 3 defaults. The training script arguments total-timesteps and eval-freq override the 'total_timesteps' and 'eval.interval' values present in these files. - The Sb3_PPO ... .ron configs specify the hyperparameters for Stable Baselines 3 PPO.
Sb3_PPO_default.ron
contains the default hyperparameters as specified in the Stable Baselines 3 documentation, and as used for the experiments in the paper.Sb3_PPO_alt.ron
contains hyperparameters closer to the defaults used for the PPO optimiser in the Entity-Neural-Network Trainer.