RasaHQ/rasa-nlu-examples

EntityOrchestrator

koaning opened this issue · 3 comments

Let's say that we have a RegexEE and DIET in our pipeline. For the RegexEE to work we need to have a few labels with entities in our nlu data. This will also prompt DIET to pick those training examples which might lead to conflicts later.

We could add a component in our pipeline, say a EntityOrchestrator that can pick up entity predictions in the NLU message like;

{   "text": "hello vincent",
    "entities": [
        {
            "entity": "name",
            "start": 6,
            "end": 13,
            "confidence_entity": 1, 
            "value": "vincent",
            "extractor": "DIETEntityExtractor"
        },{
            "entity": "name",
            "start": 6,
            "end": 13,
            "confidence_entity": 1, 
            "value": "vincent",
            "extractor": "RegexEntityExtractor"
        }
    ]
}

and can turn it into either.

  Option 1: The entityextractor of choice.

The idea would be that you can configure a hierarchy/priority within entity extractors.

{   "text": "hello vincent",
    "entities": [{
            "entity": "name",
            "start": 6,
            "end": 13,
            "confidence_entity": 1, 
            "value": "vincent",
            "extractor": "RegexEntityExtractor"
        }
    ]
}

To get this to work you'd like need to configure something like;

pipeline:
    - name: EntityOrchestrator
      priority:
          name: RegexEntityExtractor #all names can only be detected by RegexEntityExtractor
          date: DucklingEntityExtractor  #all dates can only be detected by DucklingEntityExtractor
  Option 2: Override and Pick.

We could also just fold detected entities together.

{   "text": "hello vincent",
    "entities": [
       {
            "entity": "name",
            "start": 6,
            "end": 13,
            "confidence_entity": 1, 
            "value": "vincent",
            "extractor": "EntityOrchestrator"
        }
    ]
}

To get this to work you'd like need to configure something like;

pipeline:
    - name: EntityOrchestrator
      priority:
          - RegexEntityExtractor
          - DucklingEntityExtractor
          - DietClassifierEntityExtractor

We could also implement both options, but a component that helps in this space would be appreciated.

Hi Vincent,

I suppose we need to be careful here with synonyms too, right? As I could use lookups and synonyms - lookups use Regex whilst synonyms use DIET. So using your example above, I think that would work, right? Regex would pick up the list and Diet would pick up the synonyms.

I know it's more work (a lot more probably), would be great if this could be an "intent" level option where you define which EE to use on a per intent level? I've not really thought this out much though! However for my use case, I think either of the approaches above would work.

Mark

@pomegran that's definitely an edge case that we need to double-check with a unit-test yeah. I think we can still make it work, but we gotta be careful.

Just so I understand your 'intent' concern. Are you mentioning this because you'd only like to use this feature inside of a form? Why only check for multiple entities in specific intents?

Hey Vincent.

As I said, I've not really thought it out too much - was just thinking out loud with other approaches. All the work I have done so far has been done with "forms" where I have created my own validation logic i.e. I can pick the entity extractor I want in code. I was just thinking about stories and how I could pick up the right entity (as I fell at the first hurdle due to overlapping entities).

So please take my comment with a pinch of salt. Your approach would allow me to use stories and simplify my form slot validation code :-)