In this study, a multi agent chase-escape problem using Deep Q learning. Actors of the problem are smart evader and smart pursuers with opposite goals. At the beginning of the game these agents have homogeneous properties and evader and pursuits don’t have knowledge about the map. The purpose of the pursuer robots is the catching the evader as fast as it could and the purpose of the evader robot is the escaping as much as it could. Such as this game, where a player's gain is in balance with the loss of other players are called zero-sum games. The end condition, which may differ according to the approach applied, in our study is that “any of the pursuers or evader within the same or neighbor pixel with obstacle or map border” or “one pursuer and evader within the same or neighbor pixel”, in other words, Evader catches by the any of the pursuers or evader hits an obstacle or any pursuers hits an obstacle. A new episode of the game resumes after each collision or cath. In this respect, escape-chase problems are also included in the repeat games class. In this study, the question is what any pursuer or evader can do to improve its performance in a repetitive part of the game is questioned. The method used for this study is Deep Reinforcement Learning. Agents receive rewards or penalties based on their moves within a section and update this information into the Neural Network.
MBaranPeker/Pursuit-Evasion-Game-with-Deep-Reinforcement-Learning-in-an-environment-with-an-obstacle
In this study, a multi agent chase-escape problem using Deep Q learning. Actors of the problem are smart evader and smart pursuers with opposite goals. At the beginning of the game these agents have homogeneous properties and evader and pursuits don’t have knowledge about the map. The purpose of the pursuer robots is the catching the evader as fast as it could and the purpose of the evader robot is the escaping as much as it could. Such as this game, where a player's gain is in balance with the loss of other players are called zero-sum games. The end condition, which may differ according to the approach applied, in our study is that “any of the pursuers or evader within the same or neighbor pixel with obstacle or map border” or “one pursuer and evader within the same or neighbor pixel”, in other words, Evader catches by the any of the pursuers or evader hits an obstacle or any pursuers hits an obstacle. A new episode of the game resumes after each collision or cath. In this respect, escape-chase problems are also included in the repeat games class. In this study, the question is what any pursuer or evader can do to improve its performance in a repetitive part of the game is questioned. The method used for this study is Deep Reinforcement Learning. Agents receive rewards or penalties based on their moves within a section and update this information into the Neural Network.
Python