ptwobrussell/Recipes-for-Mining-Twitter

Weird bug with Summarizing Link Target recipe

Closed this issue · 1 comments

Hi there!

I was trying out the Summarizing Link Target and i received the following error:

( i named the file as twitter_solution.py )

Traceback (most recent call last):
File "twitter_solution.py", line 127, in
summary = summarize(clean_page)
File "twitter_solution.py", line 76, in summarize
sentences = [s for s in nltk.tokenize.sent_tokenize(txt)]
File "/usr/lib/pymodules/python2.6/nltk/tokenize/init.py", line 43, in sent_tokenize
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
File "/usr/lib/pymodules/python2.6/nltk/data.py", line 590, in load
resource_val = pickle.load(_open(resource_url))
File "/usr/lib/pymodules/python2.6/nltk/data.py", line 669, in _open
return find(path).open()
File "/usr/lib/pymodules/python2.6/nltk/data.py", line 439, in find
try: return find(modified_name)
File "/usr/lib/pymodules/python2.6/nltk/data.py", line 429, in find
try: return ZipFilePathPointer(p, zipentry)
File "/usr/lib/pymodules/python2.6/nltk/data.py", line 306, in init
zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
File "/usr/lib/pymodules/python2.6/nltk/data.py", line 721, in init
zipfile.ZipFile.init(self, filename)
File "/usr/lib/python2.6/zipfile.py", line 696, in init
self._GetContents()
File "/usr/lib/python2.6/zipfile.py", line 716, in _GetContents
self._RealGetContents()
File "/usr/lib/python2.6/zipfile.py", line 728, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file

May i know if there's anything wrong ?

I am using NLTK 2.0b8 on my Ubuntu 10.04 computer.

May i know how do i fix this issue ?

Thanks!

Sorry for the delay in responding. Take a look at this page about NLTK and let me know if it helps:

http://nltk.googlecode.com/svn/trunk/doc/howto/data.html

In short, there may just be some ancillary data files that you need to download, depending on how your install of NLTK worked (although I don't believe that this wasn't the case with my installation.)

If this doesn't help, let me know, and I'll see if I can somehow recreate the problem on my end.

FWIW, I have a late TODO that involves creating a VirtualBox VM with all of the dependencies and packages installed/tested so that people can more easily test out the code without going through the configuration overhead.