anhaidgroup/deepmatcher

dm.data.preprocess no vectors found at /root/.vector_cache/wiki.en.bin

Opened this issue · 0 comments

NPap0 commented

In collab:

After installing torchtext legacy 0.11 to bypass #96 ,

When running dm.data.preprocess I get:

Reading and processing data from
0% [############################# ] 100% | ETA: 00:00:01
Reading and processing data from
0% [############################# ] 100% | ETA: 00:00:00INFO:deepmatcher.data.field:Downloading vectors from https://drive.google.com/uc?export=download&id=1Vih8gAmgBnuYDxfblbT94P6WjB7s1ZSh to /root/.vector_cache/wiki.en.bin
/usr/local/lib/python3.7/dist-packages/deepmatcher/data/field.py:79: ResourceWarning: unclosed <ssl.SSLSocket fd=61, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('172.28.0.2', 47830), raddr=('172.253.123.113', 443)>
self.destination = self.backup_destination
ResourceWarning: Enable tracemalloc to get the object allocation traceback
INFO:deepmatcher.data.field:Extracting vectors into /root/.vector_cache


RuntimeError Traceback (most recent call last)

in
2 path='',
3 train='train.csv',
----> 4 validation='validation.csv')

5 frames

/usr/local/lib/python3.7/dist-packages/deepmatcher/data/field.py in cache(self, name, cache, url, backup_url)
94 shutil.copyfileobj(infile, outfile)
95 if not os.path.isfile(path):
---> 96 raise RuntimeError('no vectors found at {}'.format(path))
97
98 self.model = fasttext.load_model(path)

RuntimeError: no vectors found at /root/.vector_cache/wiki.en.bin

I bypassed this with the solution proposed in

#57

and now:

  • Takes more space (Crucial for limited Collab capabilities)
  • Takes 12ish minutes to download only

But since this is not the same error produced I wanted to make sure this is known.