Not loading the right pre trained model
maheshmad opened this issue · 5 comments
Followed the steps in setting the project, but
I get the below error while trying to run a prediction.
Any hint?
2020-02-05 18:42:39,044 - INFO - allennlp.nn.initializers - qp_matrix_attention._bias
2020-02-05 18:42:39,044 - INFO - allennlp.nn.initializers - qp_matrix_attention._weight_vector
2020-02-05 18:42:41,383 - INFO - allennlp.common.from_params - instantiating class <class 'allennlp.data.dataset_readers.dataset_reader.DatasetReader'> from params {'lazy': False, 'pretrained_model': 'bert-base-uncased', 'question_length_limit': 50, 'skip_due_to_gold_programs': False, 'skip_instances': False, 'token_indexers': {'tokens': {'pretrained_model': 'bert-base-uncased', 'type': 'bert-drop'}}, 'type': 'drop_reader_bert'} and extras set()
2020-02-05 18:42:41,384 - INFO - allennlp.common.params - validation_dataset_reader.type = drop_reader_bert
2020-02-05 18:42:41,384 - INFO - allennlp.common.from_params - instantiating class <class 'semqa.data.dataset_readers.drop_reader_bert.DROPReaderNew'> from params {'lazy': False, 'pretrained_model': 'bert-base-uncased', 'question_length_limit': 50, 'skip_due_to_gold_programs': False, 'skip_instances': False, 'token_indexers': {'tokens': {'pretrained_model': 'bert-base-uncased', 'type': 'bert-drop'}}} and extras set()
2020-02-05 18:42:41,385 - INFO - allennlp.common.params - validation_dataset_reader.lazy = False
2020-02-05 18:42:41,385 - INFO - allennlp.common.params - validation_dataset_reader.pretrained_model = bert-base-uncased
2020-02-05 18:42:41,386 - INFO - allennlp.common.from_params - instantiating class allennlp.data.token_indexers.token_indexer.TokenIndexer from params {'pretrained_model': 'bert-base-uncased', 'type': 'bert-drop'} and extras set()
2020-02-05 18:42:41,386 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.type = bert-drop
2020-02-05 18:42:41,386 - INFO - allennlp.common.from_params - instantiating class semqa.data.dataset_readers.drop_reader_bert.BertDropTokenIndexer from params {'pretrained_model': 'bert-base-uncased'} and extras set()
2020-02-05 18:42:41,387 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.pretrained_model = bert-base-uncased
2020-02-05 18:42:41,387 - INFO - allennlp.common.params - validation_dataset_reader.token_indexers.tokens.max_pieces = 512
2020-02-05 18:42:41,614 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /root/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
2020-02-05 18:42:41,718 - INFO - allennlp.common.params - validation_dataset_reader.relaxed_span_match = True
2020-02-05 18:42:41,719 - INFO - allennlp.common.params - validation_dataset_reader.do_augmentation = True
2020-02-05 18:42:41,719 - INFO - allennlp.common.params - validation_dataset_reader.question_length_limit = 50
2020-02-05 18:42:41,719 - INFO - allennlp.common.params - validation_dataset_reader.only_strongly_supervised = False
2020-02-05 18:42:41,719 - INFO - allennlp.common.params - validation_dataset_reader.skip_instances = False
2020-02-05 18:42:41,719 - INFO - allennlp.common.params - validation_dataset_reader.skip_due_to_gold_programs = False
2020-02-05 18:42:41,720 - INFO - allennlp.common.params - validation_dataset_reader.convert_spananswer_to_num = False
2020-02-05 18:42:42,003 - INFO - pytorch_pretrained_bert.tokenization - loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /root/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
2020-02-05 18:42:42,094 - INFO - allennlp.common.registrable - instantiating registered subclass drop_demo_predictor of <class 'allennlp.predictors.predictor.Predictor'>
Traceback (most recent call last):
File "/root/anaconda3/envs/py3/bin/allennlp", line 10, in <module>
sys.exit(run())
File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/allennlp/run.py", line 18, in run
main(prog="allennlp")
File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 102, in main
args.func(args)
File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/allennlp/commands/predict.py", line 227, in _predict
manager.run()
File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/allennlp/commands/predict.py", line 206, in run
for model_input_json, result in zip(batch_json, self._predict_json(batch_json)):
File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/allennlp/commands/predict.py", line 151, in _predict_json
results = [self._predictor.predict_json(batch_data[0])]
File "./semqa/predictors/demo_predictor.py", line 180, in predict_json
instance = self._json_to_instance(inputs)
File "./semqa/predictors/demo_predictor.py", line 100, in _json_to_instance
passage_spacydoc = spacyutils.getSpacyDoc(cleaned_passage_text, spacy_nlp)
File "./utils/spacyutils.py", line 38, in getSpacyDoc
return nlp(sent)
File "/root/anaconda3/envs/py3/lib/python3.6/site-packages/spacy/language.py", line 435, in __call__
doc = proc(doc, **component_cfg.get(name, {}))
File "pipes.pyx", line 397, in spacy.pipeline.pipes.Tagger.__call__
File "pipes.pyx", line 442, in spacy.pipeline.pipes.Tagger.set_annotations
File "morphology.pyx", line 312, in spacy.morphology.Morphology.assign_tag_id
File "morphology.pyx", line 200, in spacy.morphology.Morphology.add
ValueError: [E167] Unknown morphological feature: 'ConjType' (9141427322507498425). This can happen if the tagger was trained with a different set of morphological features. If you're using a pretrained model, make sure that your models are up to date:
python -m spacy validate
2020-02-05 18:43:25,401 - INFO - allennlp.models.archival - removing temporary unarchived model dir at /tmp/tmp1kf0l594
The file looks like this
{"passage":" Hoping to snap a two-game losing streak, the Falcons went home for a Week 9 duel with the Washington Redskins. Atlanta would take flight in the first quarter as quarterback Matt Ryan completed a 2-yard touchdown pass to tight end Tony Gonzalez, followed by cornerback Tye Hill returning an interception 62 yards for a touchdown. The Redskins would answer in the second quarter as kicker Shaun Suisham nailed a 48-yard field goal, yet the Falcons kept their attack on as running back Michael Turner got a 30-yard touchdown run, followed by kicker Jason Elam booting a 33-yard field goal. Washington began to rally in the third quarter with a 1-yard touchdown run from running back Ladell Betts. The Redskins would come closer in the fourth quarter as quarterback Jason Campbell hooked up with tight end Todd Yoder on a 3-yard touchdown pass, yet Atlanta closed out the game with Turner's 58-yard touchdown run.","question":"How many yards was the shortest touchdown pass?"}
Seems like an error caused by spacy trying to process the passage text. Make sure the file you're trying to do prediction on a json-lines formatted file, were each line is a json object with the keys -- "question" and "passage". Could you please post the file or a snippet of it here?
updated the original comment with the jsonl data.
Runs fine on my machine. Could you verify that spacy is installed properly. I am using version 2.1.8
, though it shouldn't really matter. Did you try running the command python -m spacy validate
as suggested in the error log.
@nitishgupta thanks for looking.... i had to update the en-core-web-lg to latest....and that fixed it.
====================== Installed models (spaCy v2.2.3) ======================
ℹ spaCy installation:
/root/anaconda3/envs/py3/lib/python3.6/site-packages/spacy
TYPE NAME MODEL VERSION
package en-core-web-lg en_core_web_lg 2.1.0 --> 2.2.5
============================== Install updates==============================
python -m spacy download en_core_web_lg
Great. Usually reading through the error log helps!