/Scheduling-algo-DQN-AI

The agent’s goal is to maximize the total expected reward over all possible trajectories, even though we defined finite states and action space, there is still a huge number of trajectories, which motivates the use of reinforcement learning [1]. It can be converted as an iterative update in the deep-Q network, which is proposed by Watkins [2] as follows: Q(S_t,A_t )=Q(S_t,A_t )+α[r_(t+1)+γMaxQ(S_(t+1),A_(t+1) )-Q(S_t,A_t )] (2) Where left Q(S_t,A_t ) is the updating Q-values (rewards) at state S_t execute action A_t. r_(t+1)+γMaxQ(S_(t+1),A_(t+1) ) is the predicted target-Q value, where r_(t+1) is the reward when executing action A_(t+1) from state A_t into state A_(t+1).a is learning rate. MaxQ(S_(t+1),A_(t+1) ) is maximum Q-value after executing all possible actions A_(t+1). In DQN, will adopt deep neural network for predicting the Q-values

Primary LanguageJupyter Notebook

Watchers