nusnlp/esr

How to evaluate custom sentences?

Opened this issue · 4 comments

Hi @nusnlp I would love to be able to use your program to disambiguate sentences.
I don't want to evaluate your software on datasets.. I want to evaluate the pretrained model on custom provided sentences.

It is unclear wether your pretrained model require the same format as the datasets or wether it can take regular sentences as inputs.
If the special format is required, please help me undestand how can I programmatically transform any sentence into the expected format.
Are there existing tools for this translation ?
I downloaded this dataset http://lcl.uniroma1.it/wsdeval/ and looked at this file: /WSD_Evaluation_Framework/Evaluation_Datasets/semeval2015/semeval2015.data.xml
Is this the right format ?

<corpus lang="en" source="semeval2015">
<text id="d000">
<sentence id="d000.s000">
<wf lemma="this" pos="DET">This</wf>
<instance id="d000.s000.t000" lemma="document" pos="NOUN">document</instance>
<wf lemma="be" pos="VERB">is</wf>
<wf lemma="a" pos="DET">a</wf>
<instance id="d000.s000.t001" lemma="summary" pos="NOUN">summary</instance>
<wf lemma="of" pos="ADP">of</wf>
<wf lemma="the" pos="DET">the</wf>
<instance id="d000.s000.t002" lemma="european" pos="ADJ">European</instance>
<instance id="d000.s000.t003" lemma="public" pos="ADJ">Public</instance>
<instance id="d000.s000.t004" lemma="assessment" pos="NOUN">Assessment</instance>
<instance id="d000.s000.t005" lemma="report" pos="NOUN">Report</instance>
<wf lemma="(" pos=".">(</wf>
<wf lemma="epar" pos="NOUN">EPAR</wf>
<wf lemma=")" pos=".">)</wf>
<wf lemma="." pos=".">.</wf>
</sentence>
.....

I have no idea how to generate this format, no complete clear specification (only examples), no rule for selecting the xml tag (wtf is wf, vs instance, what is the rule for generating those ids ? id="d000.s000.t001".
(however I can generate the lemma and the pos, that is not the hard part)
Please, as is, this software is unusable..

Yup. Would definitely help if there was a converter.

Same here, can you please add some simple demo code that given a simple sentence and a specific word in that sentence, to give you the correct sense ?
The LMMS project has a nice demo made using Spacy as an example. https://github.com/danlou/LMMS
Thank you so much

If I remember correctly (not sure) the difference between wf and instance is the following:
wf tags are not disambiguated and instance tags are disambiguated.
So you basically choose which words you want to disambiguate. You can put instances everywhere to simplify.
I'm saying that from a distant memory that someone told me on another WSD repository. I might be wrong.

If true then the main remaining difficulty is how to reproduce the ID scheme