How to perform a click button with scrapy-selenium?

Question

How to perform a click button with scrapy-selenium?

Houssemaster opened this issue 4 years ago · 6 comments

Hello, i want to make some actions after getting response from page like clicking, hovering scrolling etc..

Answer 1 · 2021-01-29T21:27:06.000Z

Requests have an additional meta key, named driver containing the selenium driver with the request processed.
You can perform those actions with it like:

class WhateverSpider(scrapy.Spider):
	def start_requests(self):
		urls = ['www.google.com']
		for url in urls:
			yield SeleniumRequest(
				url = url,
				callback = self.parse,
				wait_time = 10)

	def parse(self, response):
		driver = response.request.meta['driver']
		# Do some stuff..
		# Click a button. 
		button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
		button.click()		
		# Do more stuff

Answer 2 · 2021-02-04T20:21:49.000Z

Requests have an additional meta key, named driver containing the selenium driver with the request processed.
You can perform those actions with it like:

class WhateverSpider(scrapy.Spider):
	def start_requests(self):
		urls = ['www.google.com']
		for url in urls:
			yield SeleniumRequest(
				url = url,
				callback = self.parse,
				wait_time = 10)

	def parse(self, response):
		driver = response.request.meta['driver']
		# Do some stuff..
		# Click a button. 
		button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
		button.click()		
		# Do more stuff

Hello, I think your solution solved part of the problem. However, there is still a problem with this snippet of code since downloading requests and parsing responses are asynchronous in scrapy. Thus, it is possible that scrapy invoked

driver.get(another_url)

in the middleware's process_request method before scrapy reaching the line:

driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')

which means at the time scrapy reached that line, the page source may have been changed.

Answer 3 · 2021-03-19T06:58:41.000Z

This will case some problem, while the code are asynchronous.

But there is another solution.
You could use the request option wait_until to perform some action like this:

def some_action(driver):
    if wait_until_conditions:
        driver.find_element(By.CLASS_NAME, '.klass')
        ……
       return True

SeleniumRequest(
            url='http://xxx.ofg',
            wait_until=some_action
        )

# if you forget to return True in wait_until callback, This code would run again and again.

Answer 4 · 2021-03-19T09:55:45.000Z

Hello, i want to make some actions after getting response from page like clicking, hovering scrolling etc..

I have the same requirement.
you can check this repo before the pull request accepted.

Answer 5 · 2021-08-18T19:15:43.000Z

Requests have an additional meta key, named driver containing the selenium driver with the request processed.
You can perform those actions with it like:
class WhateverSpider(scrapy.Spider):
	def start_requests(self):
		urls = ['www.google.com']
		for url in urls:
			yield SeleniumRequest(
				url = url,
				callback = self.parse,
				wait_time = 10)

	def parse(self, response):
		driver = response.request.meta['driver']
		# Do some stuff..
		# Click a button. 
		button = driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')
		button.click()		
		# Do more stuff
Hello, I think your solution solved part of the problem. However, there is still a problem with this snippet of code since downloading requests and parsing responses are asynchronous in scrapy. Thus, it is possible that scrapy invoked

driver.get(another_url)

in the middleware's process_request method before scrapy reaching the line:

driver.get_element_by_xpath( '//*[@id="clickable-button-foo"]')

which means at the time scrapy reached that line, the page source may have been changed.

You are right. There is only one drive. So response.request.meta['driver'] is dealing with the current url which is different from response.url. See #22
Any solution to this?

Answer 6 · 2022-01-06T16:05:25.000Z

get_element_by_xpath change to find_element_by_xpath