/huric

HuRIC 2.0 - the Human Robot Interaction Corpus

Apache License 2.0Apache-2.0

Human Robot Interaction Corpus (HuRIC 2.1)

*** UPDATE November 17th, 2022: Semantic Heads in HuRIC ***

The linguistic interpretation of commands are extended to make the Semantic Head of each argument explicit.
In a command, e.g. "take the red mug next to the keyboard", where "the red mug" corresponds to the argument Theme, the Semantic Head is useful in recognizing only the semantic category of a phrase, i.e. "mug", which is the main carier of the meaning, instead of the entire span. This allows to define an additional evaluation type, in which only the Semantic Heads are considered, enabling a wider usage of robotic action primitives. For a robotic function only the Semantic Head ("mug") may be required to execute an action. The example below was updated accordingly.

Moreover, additional entities were added to the Semantic Map, enabling a full grounded interpretation.

Introduction

HuRIC (Human Robot Interaction Corpus) is a resource that has been gathered as a collaboration between the Semantic Analytics Group (SAG) from the University of Roma, Tor Vergata, and the Laboratory of Cognitive Cooperating Robots (Lab.Ro.Co.Co.) at Sapienza, University of Rome. The basic idea of this project is to build a corpus for Human Robot Interaction in Natural Language containing information that are yet oriented to a specific application domain, e.g. the house service robotics, but at the same time inspired by sound linguistic theories, that are by definition decoupled from such a domain.

HuRIC is designed to enable the Grounded Language Interpretation of robotic commands, i.e., make the interpretation process of a robotic command dependent from the specific environment where the utterance is expressed. Without any contestual information, a command such as "take the mug next to the keyboard" is ambiguous: it may in fact express the need of picking up the mug that is near the keyboard or to bring the mug whose position is not expressed toward a new position near the keyboard. Whithout knowing the actual placement of the mug and the keyboardin the environment, it is not possible to decide the suitable interpretation, i.e. correctly assign the intended meaning to the command.

HuRIC is based on the theory of Frame Semantics and captures cognitive information about the real-world situations and events expressed in sentences. The most interesting feature is that HuRIC is not system or robot dependent and these regards the type of accepted sentences and the adopted formalism for representing and extracting their interpretation.

In order to enable the learning of Grounded Language Interpretation processes, each command in HuRIC is paired with a Semantic Map, reflecting the naming and disposition of entities in the environment that are referred by the interpretation.

HuRIC is released as an open source resource, under the Apache 2.0 license.

Corpus Definition

HuRIC exploits different situations representing possible commands given to a robot in a house environment. The corpus is composed of different subsets, characterized by different order of complexity and designed to differently stress the language recognition architecture. Each sentence is annotated linguistically as well as conceptually. In linguistic terms lemmas, POS tags, dependency trees, and Frame Semantics are annotated over the sentence. Semantic frames and frame elements are associated to sentence fragments (e.g. verbs and their syntactic arguments) and correspoind to the adopted meaning representation formalisms for the underlying command: they also conceptually reflect the actions requested to a robot, that are usually the actions it can carry out in a home environment.

HuRIC provides commands in two different languages: English and Italian. While the English subset contains 656 sentences, 241 commands are available in Italian. Almost all Italian sentences are translations of the original commands in English and the corpus keeps an alignment between them.

The number of annotated sentences, number of frames, and further statistics are reported in Table 1.

English Italian
Number of examples 656 241
Number of frames 18 14
Number of predicates 762 272
Number of roles 34 28
Predicates per sentence 1.16 1.13
Sentences per frame 36.44 17.21
Roles per sentence 2.02 1.90
Entities per sentence 6.59 6.97
Table 1: HuRIC: some statistics

Detailed statistics about the number of sentences for each frame and frame elements are reported in Table 2 and Table 3 for the English and Italian subsets, respectively.

Frame Ex Frame Ex Frame Ex
Motion 143 Bringing 153 Cotheme 39
Goal 129 Theme 153 Cotheme 39
Theme 23 Goal 95 Manner 9
Direction 9 Beneficiary 56 Goal 8
Path 9 Agent 39 Theme 4
Manner 4 Source 18 Speed 1
Area 2 Manner 1 Path 1
Distance 1 Area 1 Area 1
Source 1
Locating 90 Inspecting 29 Taking 80
Phenomenon 89 Ground 28 Theme 80
Ground 34 Desired_state 9 Source 16
Cognizer 10 Inspector 5 Agent 8
Purpose 5 Unwanted_entity 2 Purpose 2
Manner 2
Change_direction 11 Arriving 12 Giving 10
Direction 11 Goal 11 Recipient 10
Angle 3 Path 5 Theme 10
Theme 1 Manner 1 Donor 4
Speed 1 Theme 1 Reason 1
Placing 52 Closure 19 Change_operational_state 49
Theme 52 Containing_object 11 Device 49
Goal 51 Container_portal 8 Operational_state 43
Agent 7 Agent 7 Agent 17
Area 1 Degree 2
Being_located 38 Attaching 11 Releasing 9
Theme 38 Goal 11 Theme 9
Location 34 Item 6 Goal 5
Place 1 Items 1
Perception_active 6 Being_in_category 11 Manipulation 5
Phenomenon 6 Item 11 Entity 5
Manner 1 Category 11

Table 2: Distribution of frames and frame elements in the English dataset

Frame Ex Frame Ex Frame Ex
Motion 51 Locating 27 Inspecting 4
Goal 28 Phenomenon 27 Ground 2
Direction 20 Ground 6 Unwanted_entity 2
Distance 13 Manner 2 Desired_state 2
Speed 8 Purpose 1 Instrument 1
Theme 3
Path 2
Manner 1
Source 1
Bringing 59 Cotheme 13 Placing 18
Theme 60 Cotheme 13 Theme 18
Beneficiary 31 Manner 6 Goal 17
Goal 26 Goal 5 Area 1
Source 8
Closure 10 Giving 7 Change_direction 21
Container_portal 6 Theme 7 Direction 21
Containing_object 5 Recipient 6 Angle 9
Degree 1 Donor 1 Speed 9
Taking 22 Being_located 14 Being_in_category 4
Theme 22 Location 14 Item 4
Source 8 Theme 12 Category 4
Releasing 8 Change_operational_state 14
Theme 8 Device 14
Place 3

Table 3: Distribution of frames and frame elements in the Italian dataset

Corpus Release

This repository contains the whole HuRIC corpus, a collection of robotics commands.

It is composed of 2 versions, one for each language:

  • en: the English version of HuRIC
  • it: the Italian verison of HuRIC

The English version is further decomposed in 7 subsets, characterized by different order of complexity and designed to differently stress a labeling architecture.

Release format

The current release of HuRIC is made available through an XML-based format, whose extension is .hrc. An example is provided below. The targeted command is "take the mug next to the keyboard"

<?xml version="1.0" encoding="UTF-8"?>
<huricExample id="2650">
  <commands>
    <command>
      <sentence>take the mug next to the keyboard</sentence>
      <tokens>
        <token id="1" lemma="take" pos="VB" surface="take" />
        <token id="2" lemma="the" pos="DT" surface="the" />
        <token id="3" lemma="mug" pos="NN" surface="mug" />
        <token id="4" lemma="next" pos="JJ" surface="next" />
        <token id="5" lemma="to" pos="TO" surface="to" />
        <token id="6" lemma="the" pos="DT" surface="the" />
        <token id="7" lemma="keyboard" pos="NN" surface="keyboard" />
      </tokens>
      <dependencies>
        <dep from="0" to="1" type="root" />
        <dep from="1" to="3" type="dobj" />
        <dep from="3" to="2" type="det" />
        <dep from="1" to="4" type="advmod" />
        <dep from="4" to="7" type="nmod" />
        <dep from="7" to="5" type="case" />
        <dep from="7" to="6" type="det" />
      </dependencies>
      <semantics>
        <frames>
          <frame name="Bringing">
            <lexicalUnit>
              <token id="1" />
            </lexicalUnit>
            <frameElements>
              <frameElement>
                <type name="Theme" semanticHead="3" />
                <span startId="2" endId="3" />
              </frameElement>
              <frameElement>
                <type name="Goal" semanticHead="7" />
                <span startId="4" endId="7" />
              </frameElement>
            </frameElements>
          </frame>
        </frames>
      </semantics>
    </command>
  </commands>
  <semanticMap>
    <entities>
      <entity atom="p1" type="Cup">
        <attributes>
          <attribute name="contain_ability">
            <value>true</value>
          </attribute>
          <attribute name="preferred_lexical_reference">
            <value>cup</value>
          </attribute>
          <attribute name="lexical_references">
            <value>cup</value>
            <value>mug</value>
            <value>coffee cup</value>
            <value>bowl</value>
          </attribute>
        </attributes>
        <coordinate angle="0.0" x="2.0" y="5.0" z="0.0" />
      </entity>
      ...
      <entity atom="k1" type="Keyboard">
        <attributes>
          <attribute name="contain_ability">
            <value>false</value>
          </attribute>
          <attribute name="lexical_references">
            <value>keyboard</value>
            <value>console</value>
          </attribute>
        </attributes>
        <coordinate angle="0.0" x="4.0" y="1.0" z="0.0" />
      </entity>
    </entities>
  </semanticMap>
  <lexicalGroundings>
    <lexicalGrounding atom="p1" tokenId="3" />
    <lexicalGrounding atom="k1" tokenId="7" />
  </lexicalGroundings>
</huricExample>

Hence, for each command, the following information are provided:

  1. the whole sentence (i.e., <sentence/> tag), like the command above take the mug next to the keyboard.
  2. the list of tokens composing the command, along with the corresponding lemma and POS tags (i.e., the <tokens/> XML tag)
    • notice that each token is referred with an idwhich is used in the rest of the file to refer to it.
  3. the syntactic information, in terms of dependency relations among tokens (i.e., the <dependencies/> tag)
    • in the example above a row like <dep from="1" to="3" type="dobj" /> means that the third word referred by <token id="3" lemma="mug" pos="NN" surface="mug" /> expresses the direct object (i.e., the dobj) of the main verb <token id="1" lemma="take" pos="VB" surface="take" />;
    • dependency relations exist only for the English dataset and their tag is consistent with the Stanford Dependency Tagset.
  4. the semantics, based on the Frame Semantics Theory and expressed by Frames (i.e., the <frames/> tag) and Frame elements (i.e., the <frameElements/> tag):
    • even though a sentence may express an arbitrary number of frames, in the example above only the frame Bringing is expressed with two frame elements, i.e., the Theme role spanning between the second and the third token (the mug) and the Goal role, instead spanning between the forth and the seventh token (next to the keyboard);
    • for each frame element, the semantic head was marked through an attribute semanticHead: the main carier of the semantic meaning for Theme is mug, thus ID semanticHead="3" is appointed, while keyboard is main carrier for the Goal role, i.e. semanticHead="7";
  5. the configuration of the environment, in terms of entities populating the Semantic Map (SM), along with their semantic attributes (i.e., semanticMap tag):
    • each entity is identified by a unique id (atom) and characterized by a type; in the example above, two objects are in the Semantic Map, such as the object p1 which is an instance of the class Cup;
    • entities are extended through semantic or lexical <attributes/>; in the example above an instance of the class Cup may contain other entities, so that the containability property is true; these attributes also encode the multiple lexical references that can be used to refer to the entitie, such as cup, mug or bowl.
    • entities are localized within the environment through <coordinate/> which refer to an ideal gridmap.
  6. the gold groundings, providing gold mapping between linguistic symbols (namely, words of the sentence) and entities of the semantic map (i.e., lexicalGroundings tag). In the example, the token with id 3 (mug) refers to the entity p1 (Cup), while token 7 (keyboard) to entity k1 (Keyboard).

Version

This repository contains the HuRIC 2.1. The previous version of Huric is available at the following link: http://sag.art.uniroma2.it/demo-software/huric/

Main changelogs with respect to HuRIC 2.0:

  • Added Semantic Head attribute to each frame element with the corresponding ID.
  • Updated the Semantic Map with new Entities.

Main changelogs with respect to HuRIC 1.0:

  • Added additioanl annotated examples for English.
  • Added brand new examples for Italian.
  • Each sentence is now paired with a corresponding Semantic Map.

Where can I found a processing chain trained over HuRIC?

Together with the corpus, we developed a Spoken Language Understanding system called LU4R, based on a cascade of sequential labelers, whose models have been trained over HuRIC. It has been designed also for a context-aware interpretation of spoken commands, consistently with the corpus.

More details on LU4R can be found at the following link:: http://sag.art.uniroma2.it/lu4r.html

How to cite HuRIC

If you use HuRIC for your research, please cite the following paper:

Andrea Vanzo, Danilo Croce, Emanuele Bastianelli, Roberto Basili, Daniele Nardi (2020): Grounded language interpretation of robotic commands through structured learning. In: Artificial Intelligence Volume 278, January 2020, 103181, 278, 2020.

@article{DBLP:journals/ai/VanzoCBBN20,
  author    = {Andrea Vanzo and
               Danilo Croce and
               Emanuele Bastianelli and
               Roberto Basili and
               Daniele Nardi},
  title     = {Grounded language interpretation of robotic commands through structured
               learning},
  journal   = {Artificial Intelligence},
  volume    = {278},
  year      = {2020},
  url       = {https://doi.org/10.1016/j.artint.2019.103181},
  doi       = {10.1016/j.artint.2019.103181},
  biburl    = {https://dblp.org/rec/bib/journals/ai/VanzoCBBN20},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

References

Andrea Vanzo, Danilo Croce, Emanuele Bastianelli, Roberto Basili, Daniele Nardi (2020): Grounded language interpretation of robotic commands through structured learning. In: Artificial Intelligence, Volume 278, January 2020, 103181, 278, 2020.

Emanuele Bastianelli and Giuseppe Castellucci and Danilo Croce and Roberto Basili and Daniele Nardi (2017): Structured learning for spoken language understanding in human-robot interaction. In: International Journal of Robotics Research, 36 (5-7), pp. 660–683, 2017.

Emanuele Bastianelli, Danilo Croce, Andrea Vanzo, Roberto Basili, Daniele Nardi (2016) A Discriminative Approach to Grounded Spoken Language Understanding in Interactive Robotics. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, {IJCAI} 2016, New York, NY, USA, 9-15 July

Emanuele Bastianelli, Giuseppe Castellucci, Danilo Croce, Luca Iocchi, Roberto Basili, Daniele Nardi (2014): HuRIC: a Human Robot Interaction Corpus. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), European Language Resources Association (ELRA), Reykjavik, Iceland, 2014, ISBN: 978-2-9517408-8-4.

Emanuele Bastianelli, Giuseppe Castellucci, Danilo Croce, Roberto Basili, Daniele Nardi (2014): Effective and Robust Natural Language Understanding for Human Robot Interaction. In: Proceedings of the 21st European Conference on Artificial Intelligence (ECAI 2014), pp. 57 - 62, Prague, Czech Republic, 2014.