RobustFieldAutonomyLab/DRL_graph_exploration

A problem about action space of reinforcement learning in paper

zshzdt opened this issue · 4 comments

Hi, I want to ask you a question about paper.
In paper, choosing "frontier" is the action in RL, but as the robot moves, frontiers are constantly changing, which means the action space is constantly changing. In my impression,the action space should be stable in the standard reinforcement learning(or might be my impression could be wrong). How do you deal with this problem?
Thank you!

Yes, in this paper, the size of the action space is dynamic.

For standard reinforcement learning, the size of the action space is fixed because the neural network models are CNN-liked models. The output size of the traditional neural network models is required to be fixed.

With graph neural networks (GNNs), the size of the output can be unfixed. Hence we can have an action estimation with a dynamic action space size. For more information about GNNs, please read this paper https://arxiv.org/pdf/1901.00596.pdf and find PyTorch Geometric

Yes, in this paper, the size of the action space is dynamic.

For standard reinforcement learning, the size of the action space is fixed because the neural network models are CNN-liked models. The output size of the traditional neural network models is required to be fixed.

With graph neural networks (GNNs), the size of the output can be unfixed. Hence we can have an action estimation with a dynamic action space size. For more information about GNNs, please read this paper https://arxiv.org/pdf/1901.00596.pdf and find PyTorch Geometric

Thanks for your reply, but what I mean is not the dimension of action space, but the "content" or the range of the action space.
For example, for t1, candidate frontiers are (0,1)、(1,5)、(6,2); but for t2, candidate frontiers are changing to (1,2)、(3,5)、(8,9). So the range of action space is constantly changing. I am puzzled about this.
Thank you!

The range of the point coordinates is constantly changing under the global frame. However, in our paper, we only provide the local relative information between each frontier node and the current pose node to the robot because this local relative information has boundaries. This relative information is encoded in a feature vector in Eqs. (6) to (10). You can find more details from Section II (B Exploration Graph) in the paper (page 3). Hope this helps!

I get it, thank you !