The purpose of this experiment report is to investigate and evaluate the application of Deep Q-Network(DQN) in the task of BitFlipping problem. BitFlipping problem is a classical problem in Computer Science. The conventional approach typically involves enumerating over a space of size
State space:
Action space:
Reward: The reward is 0 if the final sequence generated is not equal to the target sequence, and is -1 otherwise.
For each episode, a target is generated randomly.
- python >= 3.9
- torch >= 2.0.0
Other dependencies can be installed using the following command:
conda create -n ling python=3.9
pip install -r requirements.txt
- Run the training script by DQN
cd DQN
python main.py length=8 epsilon=0.9 delta_epsilon=1e-5 target_update=50 use_wandb=false reward_success=0 reward_fail=-1 exp_name=dqn
- Run the training script by DQN_with_GOAL
cd DQNwithGOAL
python main.py length=8 epsilon=0.9 delta_epsilon=1e-5 target_update=50 use_wandb=false reward_success=0 reward_fail=-1 exp_name=dqn_g
- Run the training script by DQN_with_HER
cd DQNwithGOAL
python main.py length=8 epsilon=0.9 delta_epsilon=1e-5 target_update=50 use_wandb=false reward_success=0 reward_fail=-1 exp_name=dqn_with_her
BitFlipping environments consist of 3 reward settings: Binary, euclidean and 'Step by Step'.
Binary means that the reward only contains 2 integers: success_reward and fail_reward. The reward is success_reward if the final sequence generated is equal to the target sequence, and is fail_reward otherwise. success_reward and fail_reward are 2 hyperparameters.
euclidean means that
Step by Step means that $$ \text{reward} = \begin{cases} 0 & \text{if } \text{state[action]} = \text{goal[action]} \ -1 & \text{otherwise} \end{cases} $$
if you want to use Binary, then
python main.py env_reward_type=default
if you want to use euclidean, then
python main.py env_reward_type=euclidean
if you want to use Step by Step, then
python main.py env_reward_type=idx