Error when attempting to scrape Tweets
Closed this issue · 5 comments
This is the output when I attempt to scrape Tweets:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'chromedriver': 'chromedriver'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./scrape.py", line 204, in <module>
user.scrape(begin, end, args.by, args.delay)
File "./scrape.py", line 83, in scrape
self.__find_tweets(start, end, by, loading_delay)
File "./scrape.py", line 101, in __find_tweets
with webdriver.Chrome() as driver: # options are Chrome(), Firefox(), Safari()
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/common/service.py", line 83, in start
os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
</pre>
Thanks for opening a new issue. I'll use the information from the previously opened issue:
I installed all the requirements, but for the Chrome webdriver, which resulted in a bunch of errors. But I have vanilla Google Chrome, and un-Googled Chromium installed, as well as vanilla Firefox. Using Linux Mint 19.3.
I'm looking specifically at I installed all the requirements, but for the Chrome webdriver
, as this is where the issue is stemming from—because line 101 is using drive.Chrome()
Selenium needs to be able to see chromedriver
in your $PATH
environment variable, which it cannot.
Because you mentioned chromedriver caused issues, go ahead and use Firefox. On Mac I did this by installing Firefox (brew cask install firefox
), installing geckodriver (brew install geckodriver
), and then changing line 101 to be:
with webdriver.Firefox() as driver: # options are Chrome(), Firefox(), Safari()
It then worked perfectly for me.
Because you're using Linux, this link might help more for installing Geckodriver. To check that geckodriver
is visible on your path, open a new shell after installing geckodriver and type which geckodriver
Thanks for the response. brew
did not work in installing geckodriver, but searching around I found a webpage with instructions on how to install geckodriver and chromedriver. So both the Chrome and Firefox browser windows now auto-launch as intended.
## Geckodriver
wget https://github.com/mozilla/geckodriver/releases/download/v0.23.0/geckodriver-v0.23.0-linux64.tar.gz
sudo sh -c 'tar -x geckodriver -zf geckodriver-v0.23.0-linux64.tar.gz -O > /usr/bin/geckodriver'
sudo chmod +x /usr/bin/geckodriver
rm geckodriver-v0.23.0-linux64.tar.gz
## Chromedriver
wget https://chromedriver.storage.googleapis.com/2.29/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo chmod +x chromedriver
sudo mv chromedriver /usr/bin/
rm chromedriver_linux64.zip
While it did refresh every two weeks through a selected Twitter account, I was given this output after a while:
File "scrape.py", line 204, in <module>
user.scrape(begin, end, args.by, args.delay)
File "scrape.py", line 83, in scrape
self.__find_tweets(start, end, by, loading_delay)
File "scrape.py", line 124, in __find_tweets
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 636, in execute_script
'args': converted_args})['value']
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: call function result missing 'value'
(Session info: chrome=79.0.3945.117)
(Driver info: chromedriver=2.29.461571 (8a88bbe0775e2a23afda0ceaf2ef7ee74e822cc5),platform=Linux 5.0.0-37-generic x86_64)
Running the script begins with [ found 0 existing tweets ]
despite then operating as normal. Not sure if that's correct also.
To clarify, the “found 0 existing tweets” part just means that you have run the scraper before so it didn’t find any stored tweets from previous runs. once it finishes running it would say “found X new tweets”, and the next time the scraper is run on that account, it will say “found X existing tweets”
In regards to that error, I’m clueless 😅.
A search makes it seem as though it’s a chromedriver version issue?
I updated Chromedriver and Geckodriver to the latest versions for GNU/Linux, but I'm still having trouble with the Chromedriver, but the Geckodriver worked flawlessly. Thanks for your help.
It looks like the result has the Tweets out of order. It would be nice to also pull the date posted, amount of retweets, likes, and have options to export to CSV, TXT, HMTL.
sorry, just saw this. Those metadata are optional! You can modify the parameter METADATA_LIST
in the code in order to include those. The suggestion for different export options is good, I'll look into that in the future. Thanks for the feedback!