IEEE-NITK/open-qas

Reader: Add text features to embedding vector in embedding.py

Closed this issue · 0 comments

@anumehaagrawal had implemented extra word features in addition to the GloVe embeddings here. The problem with this is the embedding vectors would be of different sizes depending on the paragraph. So, to cover for this we need to one-hot encode all these features and append the onehot vectors to the embedding vector.

The following extra features need to be implemented:

  • POS Tagging
  • Name Entity Recognition
  • Word Lemmas
  • Exact Question Match
  • Term Frequency

The features need to be implemented in embedding.py under get_embeddings()

I've already created a dict for all the POS tags in embedder.py. Similar dicts would need to be created for other features and then converted to one hot encoding, if necessary.

Refer Paragraph Encoding