allenai/allennlp

Demo results does not match when the model is loaded locally

snijesh opened this issue · 8 comments

I have used SRL model (other models also). It seems the output generated in the demo screen are more accurate than the results obtained locally. What is the difference between the models loaded for demo and loaded locally. Or is there any extra files to be added to get better prediction

While the demo models are hosted on Google cloud for performance reasons (it's faster and cheaper to download from GCS for the running demo), they are identical.

~ curl https://s3-us-west-2.amazonaws.com/allennlp/models/bert-base-srl-2019.06.17.tar.gz > a
~ curl https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2019.06.17.tar.gz > b
~ diff a b

I don't know why you are seeing different performance. Can you give specific examples of the differences you're seeing?

See here: https://github.com/allenai/allennlp/blob/9a6962f00d2b0d30b81900b4e9764ddc3433f400/tutorials/how_to/elmo.md#notes-on-statefulness-and-non-determinism. There are several other issues in the repo with more discussion on this; you can probably find them for searching for links to that note that I linked to.

@matt-gardner does this model use ELMo? I gathered from the name that it didn't.

Ah, sorry, you're right. Though it's not clear which models @snijesh was using in each case. @snijesh, if you still have questions, feel free to post again. I'll leave this closed until we hear from you, though.

Hello, I have just encountered this problem. Given the sentence:

In 2011 the circulation of the magazine was 1,310,696 copies.

While the demo returns this beautiful result:
was: [ARGM-TMP: In 2011] [ARG1: the circulation of the magazine] [V: was] [ARG2: 1,310,696 copies] .

The python-api returns:
{'verbs': [], 'words': ['In', '2011', 'the', 'circulation', 'of', 'the', 'magazine', 'was', '1,310,696', 'copies', '.']}

In other sentences, I got the same results. I am using this model Predictor.from_path("https://s3-us-west-2.amazonaws.com/allennlp/models/bert-base-srl-2019.06.17.tar.gz"). Since it looks like it is not the elmo model, I do not know what is causing the mismatch in the performance.

Thank you for your work! allennlp is really useful :) @matt-gardner

This is almost certainly due to a mismatch in spacy models. We use spacy to detect verbs, and different versions of spacy models detect verbs differently, especially with things like "was". In the demo, with an older version of spacy, "was" gets detected as a verb, so the prediction is made. In newer versions of spacy, I believe "was" in this gets detected as AUX, so no prediction is made.

You are right. I have just downgraded spacy to 2.1.4 and now the behaviour is the same as in the demo. Thank you

Hello,
I am facing a similar issue with the semantic role label predictor.
For a sentence like 'Please take a few minutes to review our 2001 goals on Enrons intranet' part of my output is this:
{'verbs': [{'verb': 'take', 'description': '[ARGM-DIS: Please] [V: take] [ARG1: a few minutes] [ARGM-PRP: to review our 2001 goals on Enrons intranet]'

As you can see, it classifies 'to review' as Purpose. But on the demo, it correctly says that this is not a Purpose.
This is the output of demo:
take: [ARGM-DIS: Please] [V: take] [ARG1: a few minutes] [ARG0: to review our 2001 goals on Enrons intranet]

I have tried with newer as well as older spacy versions, specifically: 2.1.4,2.1.9 and 2.2.4.
Please @matt-gardner get back to me as soon as possible, I really need this to work.