ICLRreb

Dear Area Chair:

Please forgive us for interrupting your review work. We would like to report two questions about the review process.

All the doubts raised by reviewer XM6T have been explained in detail with experiments in our reply. We also provide a visual description of the interaction pattern. However, the reviewer remained silent after AC reminded XM6T to reply to our answer. In addition, We have also addressed the reviewer's main concerns: "The experiment part may not fully match with the motivation of the paper. Generally speaking, simulation environments such as Mujoco do not require the use of fragmentary control ". The experimental section describes in detail that our environment does not use Mujoco directly, but constructs multiple simulation scenarios according to realistic settings. We explained it again in our reply. This section is highlighted in the new version.
The reviewer cdnx's main concern is the lack of detailed videos in the supplementary real-world experiments.
- Since the robot snake used in our additional experiment was painted with the institution logo, most of the snake robot videos did not meet the requirements of double-blind review. To demonstrate the effectiveness of our method in the real world, we change a real-world task to conduct the experiment, but miss the first round of response time. We are here to provide a link to the experiment videos and hope that you can share it to the four reviewers to enhance the reviewer's support for our work. The new version adds these experiments.
- We provide a detailed set of robotic arm 6-DoF grasping experiments (including video).
- Task description: Remote control robot arm grasp in a 30 30 30cm3 tabletop workspace. 15 rounds of experiments are performed with Ours and baseline. In each round, 5 objects are randomly selected and placed on the table. Due to the limitation of communication, the decision end and the robot arm interact fragmentarily (compliant with our FIMDP setting). Thus, Multi-step decision making based on current observations is required to ensure completion efficiency.
- Our method (MARS) is combined with imitation learning (IL, a grasping baseline) to improve grasping performance (following table). Performance is measured using the following metrics averaged over 15 simulation rounds: 1) Grasp success rate (GSR), the ratio of success grasp executions; and 2) Declutter rate (DR), the average ratio of objects removed.
- Further, we share two videos comparing our method to the baseline on the same task. Comparing the two videos, it is found that our method works more smoothly, and the task completion rate is higher under the FIMDP setting.

Method	GSR(%)	DR(%)
Ours + IL	76.2	82.9
IL	68.6	72.7

This paper is biased towards focusing on providing the first general algorithm to address FIMDP for the deep reinforcement learning community. To increase the visibility and recognition of our algorithm in the DRL community, we choose the mainstream simulation environments of the DRL community as the main experiment environments. Besdies, in addition to robot scenarios, fragmented interactions also exist in many virtual scenarios. For example, remote control of NPCS: In the game, the cloud server needs to control the terminal NPC in real-time. NPC stalling can significantly degrade the user experience. We follow works with related evaluation scenarios (e.g.RTAC[1]RDAC[2]). We use mainstream virtual scenarios (Mujoco, D4RL) to simulate remote control of NPC tasks. And our method outperforms baselines on these environments. Further, based on these tasks, we provide a large number of analytical ablation experiments. These experiments have the ability to prove the effectiveness of our method.

torressliu/ICLRreb

ICLRreb