OnlpLab/NEMO

KeyError: 'md_lattice' in morph_hybrid_align_tokens

Closed this issue · 9 comments

Hi,
I'm having an error that repeatedly occurs with some textual inputs. I can't tell what is the issue with them.

The error (as printed in NEMO's server):

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 373, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/opt/anaconda3/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 75, in __call__
    return await self.app(scope, receive, send)
  File "/opt/anaconda3/lib/python3.8/site-packages/fastapi/applications.py", line 208, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/anaconda3/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/anaconda3/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/opt/anaconda3/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/opt/anaconda3/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/opt/anaconda3/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/opt/anaconda3/lib/python3.8/site-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/opt/anaconda3/lib/python3.8/site-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/opt/anaconda3/lib/python3.8/site-packages/starlette/routing.py", line 52, in app
    response = await func(request)
  File "/opt/anaconda3/lib/python3.8/site-packages/fastapi/routing.py", line 219, in app
    raw_response = await run_endpoint_function(
  File "/opt/anaconda3/lib/python3.8/site-packages/fastapi/routing.py", line 154, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/opt/anaconda3/lib/python3.8/site-packages/starlette/concurrency.py", line 40, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/opt/anaconda3/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "./api_main.py", line 481, in morph_hybrid_align_tokens
    return morph_hybrid(q, multi_model_name, morph_model_name, align_tokens=True)
  File "./api_main.py", line 424, in morph_hybrid
    md_lattice = run_yap_md(pruned_lattice) #TODO: this should be joint, but there is currently no joint on MA in yap api
  File "./api_main.py", line 91, in run_yap_md
    return resp['md_lattice']
KeyError: 'md_lattice'

Examples for inputs in which this happened to me:

ראש הממשלה, בנימין נתניהו, שוחח הערב בטלפון עם קנצלרית גרמניה אנגלה מרקל, בעקבות הפיגוע אתמול בבית הכנסת בעיר האלה בזמן תפילות יום הכיפורים. בשיחה ציינה הקנצלרית מרקל  כי יש בכוונתה להגביר את המאמצים לאבטחת הקהילה היהודית בארצה. פרשניתנו המדינית, אילאיל שחר, מזכירה כי מוקדם יותר שוחח נשיא המדינה, ראובן ריבלין, עם נשיא גרמניה, פרנק וולטר סטיינמאייר והדגיש כי יש להמשיך ו"לעשות ללא פשרות כדי להילחם באנטישמיות". נשיא גרמניה ביקר לפני מספר שעות בזירת הפיגוע בהאלה ואמר: "בהתחשב בהיסטוריה של גרמניה - אנחנו אחוזי אימה כשאנו מבינים שאירוע כזה התרחש כאן אצלנו".
בית המשפט העליון הכריע לפני זמן קצר כי הנאשמת בפדופיליה, מלכה לייפר, תישאר במעצר. לייפר מואשמת בשורת עבירות מין באוסטרליה וטרם הוחלט האם היא כשירה לעמוד שם לדין. כתבנו לענייני משפט, יובל אראל, מוסיף כי בהחלטה נכתב שהספק במצבה הנפשי של לייפר מעורר חשש כי מדובר בנסיון לברוח מאימת הדין. שרת המשפטים לשעבר וחברת הכנסת, איילת שקד, בירכה על ההחלטה בחשבון הטוויטר שלה והוסיפה כי לאחר חמש שנים - יש לסיים את הליך ההסגרה. 

Thanks.

cjer commented

Both examples seem to work fine for me. Can you share a code snippet that reproduces this?

cjer commented

@SapirWeissbuch Is this still relevant?

Hi, I'm having a similar problem that I think might be related.
When using the command "python nemo.py morph_hybrid_align_tokens token-single NER_TEST.txt NER_RESULTS.txt",
the file NER_RESULTS contains NER results of just a part of the text and not all of it. For some reason it stops somewhere in the middle. When I tried to use the sentences that @SapirWeissbuch used, the same thing happened (it doesn't happen when using the regular models i.e "run_ner_model token-single").
The files are attached, and here is the last part of what is printed to the screen:
Hyperparameters: Hyper lr: 0.01 Hyper lr_decay: 0.05 Hyper HP_clip: None Hyper momentum: 0.0 Hyper l2: 1e-08 Hyper hidden_dim: 200 Hyper dropout: 0.5 Hyper lstm_layer: 2 Hyper bilstm: True Hyper GPU: True DATA SUMMARY END. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ nbest: 1 Load Model from file: final_setup/models/token.char_cnn.ft_tok.46_seed build sequence labeling network... use_char: True char feature extractor: CNN word feature extractor: LSTM use crf: True build word sequence feature extractor: LSTM... build word representation... build char sequence feature extractor: CNN ... build CRF... Decode raw data, nbest: 1 ... C:\Users\Tmuna\anaconda3\envs\nemoenv\lib\site-packages\torch\nn\functional.py:652: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at ..\c10/core/TensorImpl.h:1156.) return torch.max_pool1d(input, kernel_size, stride, padding, dilation, ceil_mode) C:\Users\Tmuna\Repos\NEMO\model\crf.py:344: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at ..\aten\src\ATen\native\TensorAdvancedIndexing.cpp:1104.) cur_bp.masked_fill_(mask[idx].view(batch_size, 1, 1).expand(batch_size, tag_size, nbest), 0) Right token = 265 All token = 284 acc = 0.9330985915492958 raw: time:2.57s, speed:1.57st/s; acc: 0.9331, p: 0.0000, r: -1.0000, f: -1.0000 Predict raw 1-best result has been written into file. temp\20210827115722354779_morph_hybrid_align_tokens_token-single_morph_ner.txt

NER_TEST_2.txt
NER_RESULTS.txt
NER_RESULTS_2.txt
NER_TEST.txt

cjer commented

Strange. Again, I'm not sure what's happening at your end since it does not replicate on my end.
Here are the results I'm getting when using the same command and your input files:
NER_RESULTS_SINGLE.txt
NER_RESULTS_2_SINGLE.txt

I also used those inputs with the api version and it worked well. Really not sure what's happening here.

btw - For better results you should use the morph model with the morph_hybrid_align_tokens command, instead of token-single. This is not documented well enough...
also - i just pushed some fixes for stuff that caused nemo.py to crash with the latest version bclm. You probably wanna pull those.

cjer commented
  1. What system are you running on?
  2. Do you get the same results when you repeat the same run (same command, same input)?
    Your results look not only cut short, but really jumbled, I have a far fetched idea that maybe it has something to do with the intermediate temp files getting mixed up.
  3. Basically what I would do next is to have set DELETE_TEMP_FILES = False in config.py and see at what point things are getting messed up. You can also upload these here.
  1. Microsoft Windows server 2019 Datacenter.
  2. yes, I deleted the temp files and was still getting the same result but I'll check it again.
    thanks! i'll try it and let you know.

@SapirWeissbuch Is this still relevant?

Hi, I figured that this error was due to me accidentally adding a "<" character at the beginning of the text.

cjer commented

@SapirWeissbuch good to hear. Where exactly in the text? Maybe this is something that should be handled better by the server, or at least be given a more indicative error

cjer commented

@yuvaly95 since we found this is a different problem, let's continue yours in a separate issue. You can open one with the new checks you're running and refer to this one for more info. Thanks.

@SapirWeissbuch I'm closing this issue, if you'd like you can answer my previous question here or open a new issue if you see fit. Thanks.