novarac23/jarvis

Improve ES Querying

Opened this issue · 0 comments

Currently one of the biggest bottlenecks in the Jarvis pipeline is that elastic search query that is being used to initially retrieve documents is not great. For example if one tries to get an answer to the question How many people live in Columbus, OH? there will be a lot of documents that just have How many in the title.

We should be able to come up with a better query that retrieves more relevant documents. One of the ways to improve the query is with NER tagging. Will open up a PR for that hopefully soon.

For reference here's an example of the old query we used:

query = {
        'query': {
            'match': {
                'document_title': question_text
                }
            }
        }