machinalis/iepy

Builtin relations

Opened this issue · 17 comments

Are there builtin relations in iepy? if not, some should be added...
I recommend using the relations from ConceptNet

It could be a good thing! Perhaps as an example/kickstart....
Unfortunately we don't have the manpower right now to do it, but if you are willing to push something in this direction we have annotated corpora that could be used and we could guide you through it...

@rafacarrascosa where's this corpora? plus can iepy be used on any sentence, even on question sentences?

You've got to remember that NLP systems compete with the old IR systems that use primitive tricks like tf*idf. Those work well enough for a range of problems and are easy to apply so that it takes a good amount of focused effort to get NLP systems to do better than bag-of-words.

If you want to make a system which is useful and "knocks their socks off" I think the thing to do is pick a small number of relations in some domain and create a focused corpus for that. If you spread the effort too thin you will get something that sucks like the Alchemy API.

Question sentences are no problem.

It would be cool to see a fully stacked up open source NLP system complete with trained models. One of the very few ones out there is

http://ctakes.apache.org/

@paulhoule check out http://ProjetPP.github.io they have developed an OpenNLP system probably better than WolframAlpha in some respects.. they did this with Relation extraction, and their corpora wasnt very specific, it is open domain... I am looking into doing something like that

@rafacarrascosa i havent found any corpora that you mentioned.. what corpora do you use for testing? and is there an automatic wag for finding relations, or I have to define all of them?

@rafacarrascosa have you seen my message? I want to certainly do this project when I have time

@iScienceLuvr Yes! I saw them, unfortunately I'm overloaded with work and I want to give you a proper answer :S
I'll get back to you on monday when things are quiet-ish again.

@rafacarrascosa sure, can you send me a short answer now, and a longer one later?

May I also ask, how does IEPY deal wtih pronoun?

@iScienceLuvr
WRT Corpora: The corpora is not public but last time I checked there was interest on doing something useful with that. We have tagged corpora for:

  • was-born-in
  • is-located-at
    I don't know if this is the kind of relation you had in mind... What would you like to offer out-of-the-box? What did you had in mind?

WRT Question sentences: Afaik there should be no problem, perhaps only a slight reduction in the NLP preprecessing quality.

WRT Automatic relation: There is a lot on that subject on the web, but nothing implemented into iepy. With automatic relations you end up having to disambiguate which relations mean the same thing.

WRT Pronouns: IEPY uses the pronoun resolution that comes bundled into Stanford's CoreNLP. So 'he', 'she' and so on are usually correctly identified as the referred entity.

@rafacarrascosa so I can't get the corpora? I wanted some more features...

@iScienceLuvr no, no, you got me wrong. The corpus is not public now, but perhaps it could be public if something good can come out of it (it depends on my bosses)... so, if you expand on what you would do, I could make an argument for the bosses

@rafacarrascosa this is just a hobby, that is all

@rafacarrascosa Hello again...it has been a long time...I just wanted to ask whether it is planned to make the corpora public? If there is any need for help for developing built-in relations, I could help, but I am pretty busy. I noticed I misunderstood your previous questions. I plan to add some basic relations from ConceptNet (if you have some specific ones in mind, please tell me)...I plan to train the corpora of course...

@rafacarrascosa have you been able to see this message?

Hi @iScienceLuvr , yes, I saw it but I was too busy until now.
I've just taken your inquiry to my bosses to try and make a public release of the corpora we have... I'll let you know when I have an aswer.

@iScienceLuvr I have a partial answer:
We at Machinalis want to make it public but since they are derived works from another corpora we have to check the licensing issues with more detail.
Ie, we are willing to make them fully public but we have to check the original licenses to avoid copyright issues.

In the mean time, I have permission to share with you some corpora we have, as long as you do a fair use of them until we check the licenses.

If you send me an email at rcarrascosa@machinalis.com I'll give you the links privately.

Cheers!