explosion/sense2vec

Reddit Large Model

Closed this issue · 1 comments

Hey Guys,

Thank you so much for this amazing library. Its been really helpful in my work.

The issue I am facing is with regards to the Reddit (2019) vector model which is split into multi-part download files. I downloaded the required three files and merge then using the command below:

cat s2v_reddit_2019_lg.tar.gz.* > s2v_reddit_2019_lg.tar.gz

I am currently on Windows 10 and used Windows PowerShell for merging these files, but the problem is after merging the final tae.gz file is not opening and I keep getting this error message:

image

I looked this up and found that the issue arises if some header from the start or from the end of the archive is not readable, I have no idea how that is happening. I have tried this multiple times but every time the same result. The final merged tar.gz file is of 7.07 GB and I am using 7zip to extract the file.

Can you please help me in resolving this error? or Are these vectors available a single file download somewhere?

No worries guys, I was able to create the required file by changing the command from cat s2v_reddit_2019_lg.tar.gz.* > s2v_reddit_2019_lg.tar.gz to type s2v_reddit_2019_lg.tar.gz.* > s2v_reddit_2019_lg.tar.gz