explosion/sense2vec

Can't save the s2v model to disk

Closed this issue · 1 comments

Hi there,
I'm trying to save a sense2vec model to disk using the to_disk method but it fails with an error.

s2v.to_disk(output_path)

I tried to track down the issue by looking to the ujson folder of the srsly package but couldn't understand since I believe it's wirtten in C and I don't have any knowledge for that :)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-59-ec8fe7745533> in <module>()
----> 1 srsly.write_json(output_path / "cfg", s2v.cfg)

1 frames
/usr/local/lib/python3.6/dist-packages/srsly/_json_api.py in json_dumps(data, indent, sort_keys)
     24         )
     25     else:
---> 26         result = ujson.dumps(data, indent=indent, escape_forward_slashes=False)
     27     if sys.version_info[0] == 2:  # Python 2
     28         return result.decode("utf8")

TypeError: {'PART', 'QUANTITY', 'DATE', 'PROPN', 'ORDINAL', 'PERCENT', 'CARDINAL', 'LOC', 'ADJ', 'VERB', 'ADV', 'NORP', 'DET', 'FAC', 'PRON', 'ORG', 'MONEY', 'NUM', 'LANGUAGE', 'X', 'PRODUCT', 'INTJ', 'ADP', 'SCONJ', 'TIME', 'EVENT', 'AUX', 'GPE', 'SYM', 'NOUN', 'PERSON', 'CCONJ', 'WORK OF ART', 'PUNCT'} is not JSON serializable

While I was experimenting, I've noticed when I print s2v.cfg, it's a dictionary in the form below and I think that the value for the keyword 'senses' should be a list and not a dictionary.

{'make_key': 'default',
 'senses': {'ADJ',
  'ADP',
  'ADV',
  'AUX',
  'CARDINAL',
  'CCONJ',
  'DATE',
  'DET',
  'EVENT',
  'FAC',
  'GPE',
  'INTJ',
  'LANGUAGE',
  'LOC',
  'MONEY',
  'NORP',
  'NOUN',
  'NUM',
  'ORDINAL',
  'ORG',
  'PART',
  'PERCENT',
  'PERSON',
  'PRODUCT',
  'PRON',
  'PROPN',
  'PUNCT',
  'QUANTITY',
  'SCONJ',
  'SYM',
  'TIME',
  'VERB',
  'WORK OF ART',
  'X'},
 'split_key': 'default'}

Python version

Python 3.6.9

Packages versions

srsly 1.0.4
spacy 2.2.4
catalogue 1.0.0
numpy 1.18.5

After some verification, there is no problem with the package it's more of the scripts provided for training a custom model. Specifically the script 05_export.py.

I made a change in line 146 by wrapping all_senese with list like the following:

s2v = Sense2Vec(shape=(n_vectors, vector_size), senses=list(all_senses))

I hope it helps.