pythonhacker/harvestman-crawler

import errors for module hashlib

GoogleCodeExporter opened this issue · 3 comments

What steps will reproduce the problem?
1. Run harvestman from the terminal

What is the expected output? What do you see instead?
Several import errors for module hashlib. I do have hashlib and when I open 
python terminal and type "import hashlib" it works!

PROJECT ERROR: unmarshallable object                                            

Exception in thread Fetcher-6:                                                  

Traceback (most recent call last):                                              

  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner          
    self.run()                                                                    
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
184, in run
    if not self.__endflag: self.download(url_obj)                                  
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
135, in download
    res = conn.save_url(url_obj)                                                        
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
872, in save_url
    return self.__save_url_file(urlObj)                                                 
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
922, in __save_url_file
    update, fileverified = dmgr.is_url_cache_uptodate(url, filename, 
self.get_content_length(), self.__data)                                         

  File "/usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py", line 153, 
in is_url_cache_uptodate                                                        

    import hashilb                                                                                
ImportError: No module named hashilb                                            


Exception in thread crawler9:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
    self.run()                                                          
  File "/usr/lib/python2.6/dist-packages/HarvestMan/crawler.py", line 137, 
in run
    self.action()                                                                
  File "/usr/lib/python2.6/dist-packages/HarvestMan/crawler.py", line 270, 
in action
    self.crawl_url()                                                                
  File "/usr/lib/python2.6/dist-packages/HarvestMan/crawler.py", line 371, 
in crawl_url
    if url_obj.violates_rules():                                                       
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlparser.py", line 
967, in violates_rules
    self.violatesrules = 
GetObject('ruleschecker').violates_basic_rules(self)                 
  File "/usr/lib/python2.6/dist-packages/HarvestMan/rules.py", line 79, in 
violates_basic_rules
    if self.__apply_rep(urlObj):                                                               
  File "/usr/lib/python2.6/dist-packages/HarvestMan/rules.py", line 235, in 
__apply_rep        
    ret = rp.read()                                                                            
  File "/usr/lib/python2.6/dist-packages/HarvestMan/robotparser.py", line 
82, in read          
    f = opener.open(self.url)                                                                  
  File "/usr/lib/python2.6/dist-packages/HarvestMan/robotparser.py", line 
285, in open         
    return conn.robot_urlopen(url)                                                             
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
563, in robot_urlopen  
    self.connect(url, None, False, 0)                                                          
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
703, in connect        
    self.__error['msg'] = errdescn                                                             
UnboundLocalError: local variable 'errdescn' referenced before assignment       


Exception in thread Fetcher-3:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
    self.run()                                                          
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
184, in run
    if not self.__endflag: self.download(url_obj)                                  
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
135, in download
    res = conn.save_url(url_obj)                                                        
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
872, in save_url
    return self.__save_url_file(urlObj)                                                 
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
922, in __save_url_file
    update, fileverified = dmgr.is_url_cache_uptodate(url, filename, 
self.get_content_length(), self.__data)                                         

  File "/usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py", line 153, 
in is_url_cache_uptodate                                                        

    import hashilb                                                                                
ImportError: No module named hashilb                                            


Exception in thread Fetcher-7:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
    self.run()                                                          
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
184, in run
    if not self.__endflag: self.download(url_obj)                                  
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
135, in download
    res = conn.save_url(url_obj)                                                        
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
872, in save_url
    return self.__save_url_file(urlObj)                                                 
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
922, in __save_url_file
    update, fileverified = dmgr.is_url_cache_uptodate(url, filename, 
self.get_content_length(), self.__data)                                         

  File "/usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py", line 153, 
in is_url_cache_uptodate                                                        

    import hashilb                                                                                
ImportError: No module named hashilb                                            


Exception in thread Fetcher-4:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
    self.run()                                                          
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
184, in run
    if not self.__endflag: self.download(url_obj)                                  
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
135, in download
    res = conn.save_url(url_obj)                                                        
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
872, in save_url
    return self.__save_url_file(urlObj)                                                 
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
922, in __save_url_file
    update, fileverified = dmgr.is_url_cache_uptodate(url, filename, 
self.get_content_length(), self.__data)                                         

  File "/usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py", line 153, 
in is_url_cache_uptodate                                                        

    import hashilb                                                                                
ImportError: No module named hashilb                                            


Exception in thread Fetcher-10:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
    self.run()                                                          
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
184, in run
    if not self.__endflag: self.download(url_obj)                                  
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
135, in download
    res = conn.save_url(url_obj)                                                        
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
872, in save_url
    return self.__save_url_file(urlObj)                                                 
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
922, in __save_url_file
    update, fileverified = dmgr.is_url_cache_uptodate(url, filename, 
self.get_content_length(), self.__data)                                         

  File "/usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py", line 153, 
in is_url_cache_uptodate                                                        

    import hashilb                                                                                
ImportError: No module named hashilb                                            


Exception in thread Fetcher-9:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
    self.run()                                                          
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
184, in run
    if not self.__endflag: self.download(url_obj)                                  
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
135, in download
    res = conn.save_url(url_obj)                                                        
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
872, in save_url
    return self.__save_url_file(urlObj)                                                 
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
922, in __save_url_file
    update, fileverified = dmgr.is_url_cache_uptodate(url, filename, 
self.get_content_length(), self.__data)                                         

  File "/usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py", line 153, 
in is_url_cache_uptodate                                                        

    import hashilb                                                                                
ImportError: No module named hashilb                                            


Exception in thread Fetcher-8:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
    self.run()                                                          
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
184, in run
    if not self.__endflag: self.download(url_obj)                                  
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
135, in download
    res = conn.save_url(url_obj)                                                        
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
872, in save_url
    return self.__save_url_file(urlObj)                                                 
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
922, in __save_url_file
    update, fileverified = dmgr.is_url_cache_uptodate(url, filename, 
self.get_content_length(), self.__data)                                         

  File "/usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py", line 153, 
in is_url_cache_uptodate                                                        

    import hashilb
ImportError: No module named hashilb

Exception in thread Fetcher-1:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
184, in run
    if not self.__endflag: self.download(url_obj)
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlthread.py", line 
135, in download
    res = conn.save_url(url_obj)
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
872, in save_url
    return self.__save_url_file(urlObj)
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 
922, in __save_url_file
    update, fileverified = dmgr.is_url_cache_uptodate(url, filename, 
self.get_content_length(), self.__data)
  File "/usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py", line 153, 
in is_url_cache_uptodate
    import hashilb
ImportError: No module named hashilb


What version of the product are you using? On what operating system?
version 1.4.6

I have attached my config file.

Please provide any additional information below.


Original issue reported on code.google.com by alok.wa...@gmail.com on 12 Apr 2010 at 3:51

Attachments:

You have hashlib, but do you have hashilb?  Simple enough to fix.  As the 
message
says, the typo is on line 153 of 
/usr/lib/python2.6/dist-packages/HarvestMan/datamgr.py.

Original comment by mobilesc...@gmail.com on 22 Apr 2010 at 8:49

oops

Sorry about that,
I kept reading it as hashlib instead of hashilb.

I did fix the typo. But I am now getting a different error. Should I start a new
thread for this ?


Exception in thread crawler3:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 525, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/dist-packages/HarvestMan/crawler.py", line 137, in run
    self.action()
  File "/usr/lib/python2.6/dist-packages/HarvestMan/crawler.py", line 270, in action
    self.crawl_url()
  File "/usr/lib/python2.6/dist-packages/HarvestMan/crawler.py", line 371, in crawl_url
    if url_obj.violates_rules():
  File "/usr/lib/python2.6/dist-packages/HarvestMan/urlparser.py", line 967, in
violates_rules
    self.violatesrules = GetObject('ruleschecker').violates_basic_rules(self)
  File "/usr/lib/python2.6/dist-packages/HarvestMan/rules.py", line 79, in
violates_basic_rules
    if self.__apply_rep(urlObj):
  File "/usr/lib/python2.6/dist-packages/HarvestMan/rules.py", line 235, in __apply_rep
    ret = rp.read()
  File "/usr/lib/python2.6/dist-packages/HarvestMan/robotparser.py", line 82, in read
    f = opener.open(self.url)
  File "/usr/lib/python2.6/dist-packages/HarvestMan/robotparser.py", line 285, in open
    return conn.robot_urlopen(url)
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 563, in
robot_urlopen
    self.connect(url, None, False, 0)
  File "/usr/lib/python2.6/dist-packages/HarvestMan/connector.py", line 703, in connect
    self.__error['msg'] = errdescn
UnboundLocalError: local variable 'errdescn' referenced before assignment

Original comment by alo...@gmail.com on 22 Apr 2010 at 9:12

These look like old errors, hashlib is not used anywhere in datamgr.py now and 
errdescn name errors are all fixed. 

Original comment by abpil...@gmail.com on 9 Dec 2010 at 1:22

  • Changed state: Fixed