EntityOrchestrator
koaning opened this issue · 3 comments
Let's say that we have a RegexEE and DIET in our pipeline. For the RegexEE to work we need to have a few labels with entities in our nlu data. This will also prompt DIET to pick those training examples which might lead to conflicts later.
We could add a component in our pipeline, say a EntityOrchestrator
that can pick up entity predictions in the NLU message like;
{ "text": "hello vincent",
"entities": [
{
"entity": "name",
"start": 6,
"end": 13,
"confidence_entity": 1,
"value": "vincent",
"extractor": "DIETEntityExtractor"
},{
"entity": "name",
"start": 6,
"end": 13,
"confidence_entity": 1,
"value": "vincent",
"extractor": "RegexEntityExtractor"
}
]
}
and can turn it into either.
Option 1: The entityextractor of choice.
The idea would be that you can configure a hierarchy/priority within entity extractors.
{ "text": "hello vincent",
"entities": [{
"entity": "name",
"start": 6,
"end": 13,
"confidence_entity": 1,
"value": "vincent",
"extractor": "RegexEntityExtractor"
}
]
}
To get this to work you'd like need to configure something like;
pipeline:
- name: EntityOrchestrator
priority:
name: RegexEntityExtractor #all names can only be detected by RegexEntityExtractor
date: DucklingEntityExtractor #all dates can only be detected by DucklingEntityExtractor
Option 2: Override and Pick.
We could also just fold detected entities together.
{ "text": "hello vincent",
"entities": [
{
"entity": "name",
"start": 6,
"end": 13,
"confidence_entity": 1,
"value": "vincent",
"extractor": "EntityOrchestrator"
}
]
}
To get this to work you'd like need to configure something like;
pipeline:
- name: EntityOrchestrator
priority:
- RegexEntityExtractor
- DucklingEntityExtractor
- DietClassifierEntityExtractor
We could also implement both options, but a component that helps in this space would be appreciated.
Hi Vincent,
I suppose we need to be careful here with synonyms too, right? As I could use lookups and synonyms - lookups use Regex whilst synonyms use DIET. So using your example above, I think that would work, right? Regex would pick up the list and Diet would pick up the synonyms.
I know it's more work (a lot more probably), would be great if this could be an "intent" level option where you define which EE to use on a per intent level? I've not really thought this out much though! However for my use case, I think either of the approaches above would work.
Mark
@pomegran that's definitely an edge case that we need to double-check with a unit-test yeah. I think we can still make it work, but we gotta be careful.
Just so I understand your 'intent' concern. Are you mentioning this because you'd only like to use this feature inside of a form? Why only check for multiple entities in specific intents?
Hey Vincent.
As I said, I've not really thought it out too much - was just thinking out loud with other approaches. All the work I have done so far has been done with "forms" where I have created my own validation logic i.e. I can pick the entity extractor I want in code. I was just thinking about stories and how I could pick up the right entity (as I fell at the first hurdle due to overlapping entities).
So please take my comment with a pinch of salt. Your approach would allow me to use stories and simplify my form slot validation code :-)