char_vocab.english.txt not accessible

Question

char_vocab.english.txt not accessible

Closed this issue 5 years ago · 2 comments

Hello,

I'm trying to go through the README steps, to run python demo.py final

It seems like the latest commit removed the lines to curl char_vocab.english.txt.
Doing curl -o char_vocab.english.txt https://www.googleapis.com/storage/v1/b/e2e-coref/o/char_vocab.english.txt?alt=media directly leaves me with an empty file though.

After that demo.py crashes at line 47 with a tensorshape mismatch in the coref_model.py at line 90.
I suspect it might be related to my char_vocab.english.txt being empty?

Could you tell me how I could get this file?

EDIT: Found what I needed here: https://raw.githubusercontent.com/luheng/lsgn/master/embeddings/char_vocab.english.txt

Answer 1 · 2019-04-15T16:51:18.000Z

I recently moved the resources to yet again for various uninteresting reasons. Please see the latest https://github.com/kentonl/e2e-coref/blob/master/README.md for how to get all the files.

Answer 2 · 2019-04-30T00:13:58.000Z

I suspect that I am observing something to what @Natithan has observed.

(env3.6) huntsman-ve506-0062:e2e-coref daniel$ python demo.py final

WARNING: Logging before flag parsing goes to stderr.
W0429 19:46:45.946565 140735615419264 __init__.py:56] Some hub symbols are not available because TensorFlow version is less than 1.14

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

[nltk_data] Error loading punkt: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed
[nltk_data]     (_ssl.c:852)>
Setting CUDA_VISIBLE_DEVICES to: 
Running experiment: final
max_top_antecedents = 50
max_training_sentences = 50
top_span_ratio = 0.4
.
.
.
Done loading word embeddings.
Loading word embeddings from glove_50_300_2.txt...
Done loading word embeddings.
Traceback (most recent call last):
  File "demo.py", line 45, in <module>
    model = cm.CorefModel(config)
  File "/Users/daniel/ideaProjects/e2e-coref/coref_model.py", line 27, in __init__
    self.char_dict = util.load_char_dict(config["char_vocab_path"])
  File "/Users/daniel/ideaProjects/e2e-coref/util.py", line 58, in load_char_dict
    with codecs.open(char_vocab_path, encoding="utf-8") as f:
  File "/Users/daniel/ideaProjects/e2e-coref/env3.6/bin/../lib/python3.6/codecs.py", line 897, in open
    file = builtins.open(filename, mode, buffering)
FileNotFoundError: [Errno 2] No such file or directory: 'char_vocab.english.txt'

I suspect that , probably during commit #5dc53a5dc2c852f9b8886a01ca57c8b893acca09.
Had to curl it manually: curl -O https://lil.cs.washington.edu/coref/char_vocab.english.txt