Word Statistics File not Found. | Receiving 404 error while dowloading the file.
imVParashar opened this issue ยท 17 comments
While using the library, the word statistics file is again missing from its original source:
Please fix this as soon as possible and please make some more robust solutions for hosting this file. It looks like people faced this problem in the past as well.
Due to this issue, the production service is stopped. Please fix this asap!
Thanks in advance.!
same issue here!
We are going to have a deploy of project on Saturday, and the tokenizer has fallen!!! Please, repair it quickly!
Same error, May you please help to fix it?
Same error, waiting the solution. Thanks in advance.
Uncompress and put that folder into home dir.
So should be: ~/.ekphrasis/stats/...
Uncompress and put that folder into home dir. So should be: ~/.ekphrasis/stats/...
Hi. Thank you, with you advice I managed to fix the mentioned problem, but how there is a new one:
I am using tokenizer for twitter with following flags:
text_processor = TextPreProcessor(
normalize=['url', 'email', 'percent', 'money', 'phone', 'user',
'time', 'url', 'date', 'number'],
annotate={"hashtag",# "allcaps",
"elongated", "repeated",
'emphasis', 'censored'},
fix_html=True, # fix HTML tokens
segmenter="twitter",
corrector="twitter",
#unpack_hashtags=True, # perform word segmentation on hashtags
unpack_contractions=True, # Unpack contractions (can't -> can not)
spell_correct_elong=False, # spell correction for elongated words
tokenizer=SocialTokenizer(lowercase=True).tokenize,
dicts=[emoticons]
)
And now it says:
---TOKENIZING TWEETS NOW---
Reading twitter - 1grams ...
stats file not available!
An exception has occurred, use %tb to see the full traceback.
SystemExit: 1
/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:2890: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
Maybe the ZIP you provided doesn't have necessary archive for tokenizing twitter?
By the way, I am ytring to make it work in Google Colab, if it is important.
Hi @yistarostin,
few observations I've made from your message:
- That zip contains following folders (see the screenshot) (and twitter is included).
- You have to unzip it. It should be a folder not a zip file.
- In Google Colab the home directory is
/root
. So please carefully check if those files are available there. It should looks like/root/.ekphrasis/stats/{and here folders from screen bellow}
Hi @yistarostin, few observations I've made from your message:
- That zip contains following folders (see the screenshot) (and twitter is included).
- You have to unzip it. It should be a folder not a zip file.
- In Google Colab the home directory is
/root
. So please carefully check if those files are available there. It should looks like/root/.ekphrasis/stats/{and here folders from screen bellow}
Well, I re-made your steps and it worked! I guess I accidentally unzipped to /content
instead of /root
. Thank you and Spasibo!
Hi @yistarostin, I am new to using github, could you explain how it worked for you?
Tried using !git clone https://github.com/cbaziotis/ekphrasis.git in /root/ folder in colab (see the screenshot). How can I use the library?
@fucaja Hi.
To use this and all other modules, you need to install that. For instance, to install this module ekphrasis
, you need to simply do pip install ekphrasis
from terminal, or !pip install ekphrasis
(the same with exclamation mark) from python code. Technically, you can clone the repository, %cd
to the folder of the repositry and then do !pip install -e
, but this is a really weird way to install, as you need to know the full URL to the repository to clone it. For instance, if the repository would get moved to another Git hosting platform, you code would just stop working.
So, to install any repository, just do !pip install [module name]
To use this library, do
import [module name]
in your python code
For instance, this module includes several classes, to use them do:
from ekphrasis.classes.preprocessor import TextPreProcessor
from ekphrasis.classes.tokenizer import SocialTokenizer
from ekphrasis.dicts.emoticons import emoticons
Full example is listed in the README.md of repo (on the front page)
Hi @yistarostin.
Using !pip install I don't know where I should add the stats files in colab. Could you explain me? Thanks in advance
@fucaja As advised before, you need to put ekphrasis dictionary files to /root/ekphrasis
. In normal circumstances, it is performed automatically, but somehow it is now broken, that is why we are here in this issue. So, you need to manually download .zip archive from the link mentioned in previous comments, than upload this file to Colab to /root folder, then change directory to /root, and than unzip the archive.
I solved the problem changing the url on helpers.py adding a new link to a repository of the stats files
!pip install git+https://github.com/fucaja/ekphrasis.git
still get the same error. Already fixed?
Word statistics files not found!
Downloading...
Here's a version of my ~/.ekphrasis
from an old installation:
https://utoronto-my.sharepoint.com/:u:/g/personal/frank_niu_mail_utoronto_ca/Ed0k1JhgN8JJjmVxaBR_OzsBpMGlhhslAE9h3apvY9I_lA?e=tyZ7Nz
Unzip it and put home/frank/.ekphrasis
at ~/.ekphrasis
should solve the problem.
Notice that my link is also not permanent (limited by my university's onedrive sharepoint policy). Hopefully this issue can be properly patched before the link expired.
Uncompress and put that folder into home dir. So should be: ~/.ekphrasis/stats/...
the original url has expired, can you make another new url to download the dataset, thanks
Initially, I used my personal dropbox account to host the file as only some friends and I were using the library. It turns out that dropbox has suspended my public links for generating excessive traffic...
I moved the data to another server and updated the public link for the stats.zip file. Please, ppdate the package and try again.
build from source
pip install git+git://github.com/cbaziotis/ekphrasis.git
or install from pypi
pip install ekphrasis -U
FYI the link is https://data.statmt.org/cbaziotis/projects/ekphrasis/stats.zip
Let me know if it works now.