How to perform a click button with scrapy-selenium?
Houssemaster opened this issue · 6 comments
Hello, i want to make some actions after getting response from page like clicking, hovering scrolling etc..
Requests have an additional meta
key, named driver
containing the selenium driver with the request processed.
You can perform those actions with it like:
class WhateverSpider(scrapy.Spider):
def start_requests(self):
urls = ['www.google.com']
for url in urls:
yield SeleniumRequest(
url = url,
callback = self.parse,
wait_time = 10)
def parse(self, response):
driver = response.request.meta['driver']
# Do some stuff..
# Click a button.
button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
button.click()
# Do more stuff
Requests have an additional
meta
key, nameddriver
containing the selenium driver with the request processed.
You can perform those actions with it like:class WhateverSpider(scrapy.Spider): def start_requests(self): urls = ['www.google.com'] for url in urls: yield SeleniumRequest( url = url, callback = self.parse, wait_time = 10) def parse(self, response): driver = response.request.meta['driver'] # Do some stuff.. # Click a button. button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]') button.click() # Do more stuff
Hello, I think your solution solved part of the problem. However, there is still a problem with this snippet of code since downloading requests and parsing responses are asynchronous in scrapy. Thus, it is possible that scrapy invoked
driver.get(another_url)
in the middleware's process_request method before scrapy reaching the line:
driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
which means at the time scrapy reached that line, the page source may have been changed.
This will case some problem, while the code are asynchronous.
But there is another solution.
You could use the request option wait_until
to perform some action like this:
def some_action(driver):
if wait_until_conditions:
driver.find_element(By.CLASS_NAME, '.klass')
……
return True
SeleniumRequest(
url='http://xxx.ofg',
wait_until=some_action
)
# if you forget to return True in wait_until callback, This code would run again and again.
Hello, i want to make some actions after getting response from page like clicking, hovering scrolling etc..
I have the same requirement.
you can check this repo before the pull request accepted.
Requests have an additional
meta
key, nameddriver
containing the selenium driver with the request processed.
You can perform those actions with it like:class WhateverSpider(scrapy.Spider): def start_requests(self): urls = ['www.google.com'] for url in urls: yield SeleniumRequest( url = url, callback = self.parse, wait_time = 10) def parse(self, response): driver = response.request.meta['driver'] # Do some stuff.. # Click a button. button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]') button.click() # Do more stuffHello, I think your solution solved part of the problem. However, there is still a problem with this snippet of code since downloading requests and parsing responses are asynchronous in scrapy. Thus, it is possible that scrapy invoked
driver.get(another_url)
in the middleware's process_request method before scrapy reaching the line:
driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
which means at the time scrapy reached that line, the page source may have been changed.
You are right. There is only one drive. So response.request.meta['driver'] is dealing with the current url which is different from response.url. See #22
Any solution to this?
get_element_by_xpath
change to find_element_by_xpath