Sefrwahed/Alfred

Entity Extraction

Closed this issue · 1 comments

Story

As a module developper I can specify and get certain entities from a sentence entered by user, so that I can use those entities in my callback function, and for enhancing the classification

Description

During Module Development

  • Get the needed entities by the user in a json file
  • Check they are implemented.. [Ask for training corpus if not and make a custom entity]

Pre Classification

  • Get general entities using Spacy

Post Classification

  • Extract all spacy and duckling entities, and return those required by the classified module only
  • Solve conflict between entities if they exist
  • Put them in a dict of key and value in base class to be used by the module developer

Description Edit

During Module Development

  • Get the needed entities by the user as a list of Enums, to make it easier for him to know the supported types, and for us to skip the checks

Post Classification

  • Get a list of entities needed by the module developper
  • filter them into entities supported by spaCy and entities supported by duckling
  • Extract all spacy entities, and return those filtered as "supported by spaCy" from the step before
  • Extract the specified duckling entities directly using dynamic function call, for ex: calls parse_time if "time" was one from the list of "entities supported by Duckling"
  • return a dict of extracted entities from both libraries

that way there will be no need to resolve the conflicts.
also note that if a certain entity is supported by both spaCy and duckling, the latter is favored (this is done hardcoded in the list of supported types)