TypeError: expected string or buffer

Question

TypeError: expected string or buffer

Opened this issue 10 years ago · 13 comments

Sometimes it runs, sometimes it doesn't.

[14:22:38] INFO::email_crawler - Crawling http://www.google.com.au/search?q=electrician&start=0
[14:22:39] ERROR::email_crawler - Exception at url: http://www.google.com.au/search?q=electrician&start=0
HTTP Error 503: Service Unavailable
[14:22:39] ERROR::email_crawler - EXCEPTION: expected string or buffer

Answer 1 · 2014-09-05T18:19:31.000Z

+1 Same here!

Answer 2 · 2015-03-23T16:05:02.000Z

+1 Same here. Could you please suggest a fix for this? Thank you

Answer 3 · 2015-11-11T12:26:45.000Z

+1 Same problem

Answer 4 · 2016-01-22T18:37:29.000Z

python email_crawler.py "intext:gmail filetype:csv"
[10:14:12] INFO::email_crawler - ----------------------------------------
[10:14:12] INFO::email_crawler - Keywords to Google for: intext:gmail filetype:csv
[10:14:12] INFO::email_crawler - ----------------------------------------
[10:14:12] INFO::email_crawler - Crawling http://www.google.com/search?q=intext%3Agmail+filetype%3Acsv&start=0
[10:14:14] INFO::email_crawler - Crawling http://www.google.com/search?q=intext%3Agmail+filetype%3Acsv&start=10
...
[10:14:59] ERROR::email_crawler - Exception at url: http://www.google.com/search?q=intext%3Agmail+filetype%3Acsv&start=390
HTTP Error 503: Service Unavailable
[10:14:59] ERROR::email_crawler - EXCEPTION: expected string or buffer 
Traceback (most recent call last):
  File "email_crawler.py", line 212, in <module> 
    crawl(arg)
  File "email_crawler.py", line 65, in crawl
    for url in google_url_regex.findall(data):
TypeError: expected string or buffer

Answer 5 · 2016-04-25T18:05:08.000Z

same problem

Answer 6 · 2016-05-03T10:42:56.000Z

This issue should be resolved with this merge #7

Answer 7 · 2016-07-14T21:06:27.000Z

issue still not resolved, same here with the last version cloned from git on my linux

Answer 8 · 2017-02-03T11:06:34.000Z

I still have a problem with "TypeError: expected string or buffer" . Can anyone help?

Answer 9 · 2017-04-09T00:57:08.000Z

Have the same issue as well

Answer 10 · 2017-06-02T22:31:37.000Z

Here is a solution to your problem;

Open the file email_crawler.py
(If you are using the terminal use nano email_crawler.py to edit the file)
Go to the 24th line saying MAX_SEARCH_RESULTS = 500 and then change it to MAX_SEARCH_RESULTS = 100

Note that the reason behind this is that due to the fact that the scripts crawls 500 pages of google, the later treats the requests as spam and proceeds accordingly as if it's a spam-like script trying to scrape the internet using Google's search engine.

Answer 11 · 2018-02-21T19:05:35.000Z

I've got it too, and what @kevingatera didn't work
the exact error I get is
It happens before it even gets the second page done so it's not the script being blocked

:~/python-email-crawler$ python email_crawler.py "ios developers" [19:05:06] INFO::email_crawler - ---------------------------------------- [19:05:06] INFO::email_crawler - Keywords to Google for: ios developers [19:05:06] INFO::email_crawler - ---------------------------------------- [19:05:06] INFO::email_crawler - Crawling http://www.google.com/search?q=ios+developers&start=0 [19:05:06] ERROR::email_crawler - Exception at url: http://www.google.com/searchq=ios+developers&start=0 HTTP Error 503: Service Unavailable [19:05:06] ERROR::email_crawler - EXCEPTION: expected string or buffer traceback (most recent calll ast): File "email_crawler.py", line 212, in <module> crawl(arg) File "email_crawler.py", line 65, in crawl for url in google_url_regex.findall(data) typeError: expected string or buffer

Answer 12 · 2018-02-22T00:08:20.000Z

@charlieporth1 What's happening is that Google blocks your IP almost as soon as they get your request. Using another computer/IP will work.

Answer 13 · 2018-02-25T01:36:12.000Z

@kevingatera turns out I was using torify and that didn't help. You should include IP rotation similar to whats in here here I would help you if I knew more about python