MatthewWolff/TwitterScraper

Error when attempting to scrape Tweets

Closed this issue · 5 comments

This is the output when I attempt to scrape Tweets:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/common/service.py", line 76, in start
    stdin=PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1364, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'chromedriver': 'chromedriver'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./scrape.py", line 204, in <module>
    user.scrape(begin, end, args.by, args.delay)
  File "./scrape.py", line 83, in scrape
    self.__find_tweets(start, end, by, loading_delay)
  File "./scrape.py", line 101, in __find_tweets
    with webdriver.Chrome() as driver:  # options are Chrome(), Firefox(), Safari()
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
    self.service.start()
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/common/service.py", line 83, in start
    os.path.basename(self.path), self.start_error_message)
selenium.common.exceptions.WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://sites.google.com/a/chromium.org/chromedriver/home
</pre>

Thanks for opening a new issue. I'll use the information from the previously opened issue:

I installed all the requirements, but for the Chrome webdriver, which resulted in a bunch of errors. But I have vanilla Google Chrome, and un-Googled Chromium installed, as well as vanilla Firefox. Using Linux Mint 19.3.

I'm looking specifically at I installed all the requirements, but for the Chrome webdriver, as this is where the issue is stemming from—because line 101 is using drive.Chrome() Selenium needs to be able to see chromedriver in your $PATH environment variable, which it cannot.

Because you mentioned chromedriver caused issues, go ahead and use Firefox. On Mac I did this by installing Firefox (brew cask install firefox), installing geckodriver (brew install geckodriver), and then changing line 101 to be:

with webdriver.Firefox() as driver: # options are Chrome(), Firefox(), Safari()

It then worked perfectly for me.

Because you're using Linux, this link might help more for installing Geckodriver. To check that geckodriver is visible on your path, open a new shell after installing geckodriver and type which geckodriver

Thanks for the response. brew did not work in installing geckodriver, but searching around I found a webpage with instructions on how to install geckodriver and chromedriver. So both the Chrome and Firefox browser windows now auto-launch as intended.

## Geckodriver
wget https://github.com/mozilla/geckodriver/releases/download/v0.23.0/geckodriver-v0.23.0-linux64.tar.gz
sudo sh -c 'tar -x geckodriver -zf geckodriver-v0.23.0-linux64.tar.gz -O > /usr/bin/geckodriver'
sudo chmod +x /usr/bin/geckodriver
rm geckodriver-v0.23.0-linux64.tar.gz

## Chromedriver
wget https://chromedriver.storage.googleapis.com/2.29/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo chmod +x chromedriver
sudo mv chromedriver /usr/bin/
rm chromedriver_linux64.zip

While it did refresh every two weeks through a selected Twitter account, I was given this output after a while:

  File "scrape.py", line 204, in <module>
    user.scrape(begin, end, args.by, args.delay)
  File "scrape.py", line 83, in scrape
    self.__find_tweets(start, end, by, loading_delay)
  File "scrape.py", line 124, in __find_tweets
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 636, in execute_script
    'args': converted_args})['value']
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.6/dist-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: call function result missing 'value'
  (Session info: chrome=79.0.3945.117)
  (Driver info: chromedriver=2.29.461571 (8a88bbe0775e2a23afda0ceaf2ef7ee74e822cc5),platform=Linux 5.0.0-37-generic x86_64)

Running the script begins with [ found 0 existing tweets ] despite then operating as normal. Not sure if that's correct also.

To clarify, the “found 0 existing tweets” part just means that you have run the scraper before so it didn’t find any stored tweets from previous runs. once it finishes running it would say “found X new tweets”, and the next time the scraper is run on that account, it will say “found X existing tweets”

In regards to that error, I’m clueless 😅.
A search makes it seem as though it’s a chromedriver version issue?

https://stackoverflow.com/questions/49162667/unknown-error-call-function-result-missing-value-for-selenium-send-keys-even

I updated Chromedriver and Geckodriver to the latest versions for GNU/Linux, but I'm still having trouble with the Chromedriver, but the Geckodriver worked flawlessly. Thanks for your help.

It looks like the result has the Tweets out of order. It would be nice to also pull the date posted, amount of retweets, likes, and have options to export to CSV, TXT, HMTL.

sorry, just saw this. Those metadata are optional! You can modify the parameter METADATA_LIST in the code in order to include those. The suggestion for different export options is good, I'll look into that in the future. Thanks for the feedback!