Anorov/cloudflare-scrape

the module works equivalently to requests module. It does not help against cloudflare. see code below

Mcklmo opened this issue · 0 comments

Before creating an issue, first upgrade cfscrape with pip install -U cfscrape and see if you're still experiencing the problem. Please also confirm your Node version (node --version or nodejs --version) is version 10 or higher.

Make sure the website you're having issues with is actually using anti-bot protection by Cloudflare and not a competitor like Imperva Incapsula or Sucuri. And if you're using an anonymizing proxy, a VPN, or Tor, Cloudflare often flags those IPs and may block you or present you with a captcha as a result.

Please confirm the following statements and check the boxes before creating an issue:

  • I've upgraded cfscrape with pip install -U cfscrape
  • I'm using Node version 10 or higher
  • The site protection I'm having issues with is from Cloudflare
  • I'm not using Tor, a VPN, or an anonymizing proxy

Python version number

Run python --version and paste the output below:
Python 3.11.2

cfscrape version number

Run pip show cfscrape and paste the output below:
Name: cfscrape
Version: 2.1.1
Summary: A simple Python module to bypass Cloudflare's anti-bot page. See https://github.com/Anorov/cloudflare-scrape for more information.
Home-page: https://github.com/Anorov/cloudflare-scrape
Author: Anorov
Author-email: anorov.vorona@gmail.com
License: UNKNOWN
Location: C:\Users\mh98\AppData\Local\Programs\Python\Python311\Lib\site-packages
Requires: requests
Required-by:

Code snippet involved with the issue

from bs4 import BeautifulSoup
import cfscrape

valid = True
cnt = 0
url = 'https://www.cdp.net/en/responses?queries%5Bname%5D=nike'

# send requests until the scraper protection kicks in
while valid:
    cnt += 1
    print(cnt)

    # scrape
    scraper = cfscrape.create_scraper()
    res = scraper.get(url) 
    soup = BeautifulSoup(res.content, 'html.parser')
    table = soup.find('table', class_='sortable_table')
    
    # if protection is activated, the table will not be found. Exit loop.
    # takes approx. 40 requests
    if table == None:
        valid = False
        print('scraper protection kicked in')

Complete exception and traceback

(If the problem doesn't involve an exception being raised, leave this blank)


URL of the Cloudflare-protected page

https://www.cdp.net/en/responses?queries%5Bname%5D=nike
[LINK GOES HERE]

URL of Pastebin/Gist with HTML source of protected page

https://gist.github.com/Mcklmo/7a840a9a8c0360dd5ad04cfe4a3d1b7d
[LINK GOES HERE]