Invalid Answers
Opened this issue ยท 14 comments
When asking: "Who was the first president of the United States?"
The answer is:
Normally vice presidents hold some power and special responsibilities below that of the president. The amendment also specifies that if any eligible person serves as president or acting president for more than two years of a term for which some other eligible person was elected president, the former can only be elected president once. Mitt Romney for president. Perhaps the best known sub-national presidents are the borough presidents of the Five Boroughs of New York City. The president fulfills various ceremonial duties.
@infosisio currently I am unable to maintain this project, I have identified some issues/shortcomings with the project if you are interested to contribute I will share it with you.
@5hirish Hey, I am interested in helping out, please let me know what needs to be done, and I'll try to do something about it :)
@idoroiengel That's great to hear. When I started the project the basic outline I chalked out was to have a Question Answering system where you would ask a question it would go and perform basic NLP operations on the question like Tokenisation, Stemming, POS tagging, Dependency extraction. It will try to extract all the relevant keywords from the question which could be used to construct a query to search on any knowledge source. After searching the on a knowledge source it would get the raw data, try to filter out irrelevant information or summarize and generate candidate answers and rank them.
Since then a lot of things have changed with my understanding of this problem statement and the different ways to solve it. There are a lot of constructs in the system currently that can work against its favor and give out irrelevant answers such as above. To understand the current state of the system I would redirect you to /docs
folder of the repo where there is an architecture diagram and a white paper of the system. I will also note down a couple of issues I am aware of here in this issue. Also, the build on Travis is failing I will also look in to that and try to fix it. In the mean time you can reach out to mean on my email address in case you need nay help with the project and trying to understand its codebase or having any troubles setting up the project.
I have compiled this list a long time ago, so I have forgotten the specifics of it, but nonetheless, it should be a good start.
- Issues with the keywords being searched on Wikipedia [Selective Search]: Irrelevant keywords being searched on knowledge source leading to add noise in the extracted knowledge.
- Improve the keyword extraction: Working on a keyword extraction algorithm, so that the current rule-based keyword extraction can be deprecated for an unsupervised methodology. We can look into the dependency relations of each token and take into account its other grammatical features to identify the keywords in it.
- Search on the structured info: A lot of tabular and structured information is extracted from Wikipedia. Work on an algorithm to search on nested JSON data to identify the relevant keys in it and get their values.
- Question classification: Revisited question classification model (Support Vector Machine), tweak it if necessary try to include the classified label in keyword extraction or query construction phase to improve keyword extraction/query construction
- Information retrieval: Revisit information extraction phase (Vector Space Model), can we improve it with LSTM maybe?
- Can we leverage Elasticsearch more in the project?
@idoroiengel Maybe this easiest thing to start with can be upgrading the dependencies like spacy. I would be glad if we can revive this project and will try to take this up more regularly!!!
Fixed build issues with Travis CI
what is know_corp in Corpus and how does it will affect the model?
@5hirish sounds good, I also already glanced at some of the docs, and I think I got the basics. I work mostly on Android, but since I'm MA Linguistics graduate I want to do some NLP coding. I can take a look at the dependencies this week. I built it successfully with the current dependencies on my local machine, and ran it a few times with several queries to test the system.
@5hirish do you have any specific notes for the branches of the project that I should be aware of? Also, should we continue this discussion in a different conversation?
@idoroiengel currently all the branches are stale and no feature is under development. So, master
is the stable branch. Yes, let us carry out this conversation on mail (mail@5hirish.com) or Gitter or maybe Slack.
Also, in December I was thinking of trying to implement some of the SQUAD 2.0 approaches. SQUAD 2.0 Think ths would be a good start to kickstart the project again. Going through some of the approaches from this competition and trying to implement one of it that uits our project and the problem we are trying to solve.
@TharunAts this would be an intermediate storage file to store the extracted knowledge source from Wikipedia which is later processed and ranked. Not proud of how I approached this problem at the time ๐
@infosisio @idoroiengel I have created a Gitter chat for the project, which would be much more convenient for any discussions related to the project. As broad conversations would be quite inconvenient to carry out on a single issue. Feel free to join Gitter chat
Also, I had created a Kanban project board here on GitHub when I was thinking of SQAUD competition and have documented whatever initial findings I had done. Kanban Board