TypeError: expected string or buffer
Opened this issue · 13 comments
Sometimes it runs, sometimes it doesn't.
[14:22:38] INFO::email_crawler - Crawling http://www.google.com.au/search?q=electrician&start=0
[14:22:39] ERROR::email_crawler - Exception at url: http://www.google.com.au/search?q=electrician&start=0
HTTP Error 503: Service Unavailable
[14:22:39] ERROR::email_crawler - EXCEPTION: expected string or buffer
+1 Same here!
+1 Same here. Could you please suggest a fix for this? Thank you
+1 Same problem
python email_crawler.py "intext:gmail filetype:csv"
[10:14:12] INFO::email_crawler - ----------------------------------------
[10:14:12] INFO::email_crawler - Keywords to Google for: intext:gmail filetype:csv
[10:14:12] INFO::email_crawler - ----------------------------------------
[10:14:12] INFO::email_crawler - Crawling http://www.google.com/search?q=intext%3Agmail+filetype%3Acsv&start=0
[10:14:14] INFO::email_crawler - Crawling http://www.google.com/search?q=intext%3Agmail+filetype%3Acsv&start=10
...
[10:14:59] ERROR::email_crawler - Exception at url: http://www.google.com/search?q=intext%3Agmail+filetype%3Acsv&start=390
HTTP Error 503: Service Unavailable
[10:14:59] ERROR::email_crawler - EXCEPTION: expected string or buffer
Traceback (most recent call last):
File "email_crawler.py", line 212, in <module>
crawl(arg)
File "email_crawler.py", line 65, in crawl
for url in google_url_regex.findall(data):
TypeError: expected string or buffer
same problem
issue still not resolved, same here with the last version cloned from git on my linux
I still have a problem with "TypeError: expected string or buffer" . Can anyone help?
Have the same issue as well
Here is a solution to your problem;
- Open the file
email_crawler.py
(If you are using the terminal usenano email_crawler.py
to edit the file) - Go to the 24th line saying
MAX_SEARCH_RESULTS = 500
and then change it toMAX_SEARCH_RESULTS = 100
Note that the reason behind this is that due to the fact that the scripts crawls 500 pages of google, the later treats the requests as spam and proceeds accordingly as if it's a spam-like script trying to scrape the internet using Google's search engine.
I've got it too, and what @kevingatera didn't work
the exact error I get is
It happens before it even gets the second page done so it's not the script being blocked
:~/python-email-crawler$ python email_crawler.py "ios developers" [19:05:06] INFO::email_crawler - ---------------------------------------- [19:05:06] INFO::email_crawler - Keywords to Google for: ios developers [19:05:06] INFO::email_crawler - ---------------------------------------- [19:05:06] INFO::email_crawler - Crawling http://www.google.com/search?q=ios+developers&start=0 [19:05:06] ERROR::email_crawler - Exception at url: http://www.google.com/searchq=ios+developers&start=0 HTTP Error 503: Service Unavailable [19:05:06] ERROR::email_crawler - EXCEPTION: expected string or buffer traceback (most recent calll ast): File "email_crawler.py", line 212, in <module> crawl(arg) File "email_crawler.py", line 65, in crawl for url in google_url_regex.findall(data) typeError: expected string or buffer
@charlieporth1 What's happening is that Google blocks your IP almost as soon as they get your request. Using another computer/IP will work.
@kevingatera turns out I was using torify and that didn't help. You should include IP rotation similar to whats in here here I would help you if I knew more about python