wikipedia2vec/wikipedia2vec

BufferError after training embeddings

ScholliYT opened this issue · 2 comments

After the training of new embeddings is finished and the finishing cleanup is executed a BufferError arises, crashing the whole program.

tstein@kratos:/data/tstein/git/REL/wikidata_dump/w2v_workdir$ wikipedia2vec train-embedding dump_file dump_dict wikipedia2vec_trained --link-graph dump_graph --mention-db dump_mention  --dim-size 1 --iteration 1
[2022-12-12 22:24:08,242] [INFO] Total number of word occurrences: 1164014674 (train_embedding@cli.py:257)
[2022-12-12 22:24:08,242] [INFO] Building a sampling table for frequent words... (train_embedding@cli.py:257)
[2022-12-12 22:24:11,528] [INFO] Building tables for negative sampling... (train_embedding@cli.py:257)
[2022-12-12 22:24:48,124] [INFO] Building tables for link indices... (train_embedding@cli.py:257)
[2022-12-12 22:24:55,960] [INFO] Starting to train embeddings... (train_embedding@cli.py:257)
[2022-12-12 22:24:56,408] [INFO] Initializing weights... (train_embedding@cli.py:257)
Iteration 1/1: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3448979/3448979 [11:52<00:00, 4840.04it/s]
[2022-12-12 22:36:52,358] [INFO] Terminating pool workers... (train_embedding@cli.py:257)
Traceback (most recent call last):
  File "/data/tstein/git/REL/wikidata_dump/venvW2V/bin/wikipedia2vec", line 11, in <module>
    load_entry_point('wikipedia2vec==1.0.5', 'console_scripts', 'wikipedia2vec')()
  File "/data/tstein/git/REL/wikidata_dump/venvW2V/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/data/tstein/git/REL/wikidata_dump/venvW2V/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/data/tstein/git/REL/wikidata_dump/venvW2V/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/tstein/git/REL/wikidata_dump/venvW2V/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/tstein/git/REL/wikidata_dump/venvW2V/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/data/tstein/git/REL/wikidata_dump/venvW2V/lib/python3.9/site-packages/wikipedia2vec/cli.py", line 97, in wrapper
    return func(*args, **kwargs)
  File "/data/tstein/git/REL/wikidata_dump/venvW2V/lib/python3.9/site-packages/wikipedia2vec/cli.py", line 34, in wrapper
    return func(*args, **kwargs)
  File "/data/tstein/git/REL/wikidata_dump/venvW2V/lib/python3.9/site-packages/wikipedia2vec/cli.py", line 257, in train_embedding
    wiki2vec.train(dump_db, link_graph, mention_db, tokenizer, sent_detect, **kwargs)
  File "wikipedia2vec/wikipedia2vec.pyx", line 365, in wikipedia2vec.wikipedia2vec.Wikipedia2Vec.train
BufferError: cannot close exported pointers exist

The error occurs on this line:

I am using Python 3.9

For now I just removed the two lines that close syn0_mmap and syn1_mmap. Seems to work.

Thank you for reporting the issue! I believe the issue is resolved in the latest release.