
Interactive Speech2Pickup Network for Speech based Human-Robot Interaction

We propose Interactive Speech2Pickup Network for Speech based Human-Robot Collaboration. The proposed model takes speech from the person as input, and predicts the desired task specific output. We tested our model on Multi-object Detection task.

Our proposed method could handle two problems that the baseline methods struggle. (Baseline: Automatic Speech Recognition + Text input based model)

  • Error accumulation due to seperated optimization.
  • Time delay due to network based ASR system.

Extra material:

Proposed model




1. Prediction accuracy & Time efficiency

2. Model prediction example

1) Speech2Pickup (word unit embedding)

2) Speech2Pickup (sentence unit embedding)


