microsoft/spacy-ann-linker

[BUG] spacy_ann create_index failed

tungwini opened this issue · 1 comments

Describe the bug

I tried to run Tutorial from here, but
spacy_ann create_index en_core_web_md kb_dir/ models_dir/ failed with the following error. File "/opt/conda/lib/python3.7/site-packages/spacy_ann/candidate_generator.py", line 336, in <lambda> p.with_suffix(".json"), self.short_aliases File "/opt/conda/lib/python3.7/site-packages/srsly/_json_api.py", line 74, in write_json json_data = json_dumps(data, indent=indent) File "/opt/conda/lib/python3.7/site-packages/srsly/_json_api.py", line 26, in json_dumps result = ujson.dumps(data, indent=indent, escape_forward_slashes=False) TypeError: {'NLP', 'OS', 'ML'} is not JSON serializable
Tracing the code, it passes a list toe json_dumps()

To Reproduce

Follow steps in Tutorial from here,

  1. pip install spacy-ann-linker
  2. spacy_ann example_data ./kb_dir
  3. spacy download en_core_web_md
  4. spacy_ann create_index en_core_web_md kb_dir/ models_dir/ failed with the following error.

It outputs:
================================= Load Model =================================
⠙ Loading model en_core_web_md Done.

============================ Apply EntityEncoder ============================
⠙ Applying EntityEncoder to descriptions Finished, embeddings created
Done adding entities and aliases to kb

============================== Create ANN Index ==============================

0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|


Traceback (most recent call last):
File "/opt/conda/bin/spacy_ann", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.7/site-packages/spacy_ann/cli/init.py", line 24, in main
typer.run(commands[command])
File "/opt/conda/lib/python3.7/site-packages/typer/main.py", line 855, in run
app()
File "/opt/conda/lib/python3.7/site-packages/typer/main.py", line 214, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/typer/main.py", line 497, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.7/site-packages/spacy_ann/cli/create_index.py", line 113, in create_index
nlp.to_disk(output_dir)
File "/opt/conda/lib/python3.7/site-packages/spacy/language.py", line 927, in to_disk
util.to_disk(path, serializers, exclude)
File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 681, in to_disk
writer(path / key)
File "/opt/conda/lib/python3.7/site-packages/spacy/language.py", line 925, in
serializers[name] = lambda p, proc=proc: proc.to_disk(p, exclude=["vocab"])
File "/opt/conda/lib/python3.7/site-packages/spacy_ann/ann_linker.py", line 199, in to_disk
self.cg.to_disk(path)
File "/opt/conda/lib/python3.7/site-packages/spacy_ann/candidate_generator.py", line 347, in to_disk
to_disk(path, serializers, {})
File "/opt/conda/lib/python3.7/site-packages/spacy/util.py", line 681, in to_disk
writer(path / key)
File "/opt/conda/lib/python3.7/site-packages/spacy_ann/candidate_generator.py", line 336, in
p.with_suffix(".json"), self.short_aliases
File "/opt/conda/lib/python3.7/site-packages/srsly/_json_api.py", line 74, in write_json
json_data = json_dumps(data, indent=indent)
File "/opt/conda/lib/python3.7/site-packages/srsly/_json_api.py", line 26, in json_dumps
result = ujson.dumps(data, indent=indent, escape_forward_slashes=False)
TypeError: {'NLP', 'OS', 'ML'} is not JSON serializable

  • But I expected it to output:
    ...
    Fitting ann index took xxx seconds

Environment

  • OS: [e.g. Linux / Windows / macOS]
    Amazon Linux AMI 2018.03

  • Python version

Python 3.7.10

Additional context

spacy-ann-linker==0.3.3
spacy==2.3.7
nmslib==2.0.5
scikit-learn==0.21.3
srsly==1.0.5

This seems to be resolvable by running pip install srsly==2.0.0 as per this comment in Issue #6