/CFSession

A python script utilizing undetected-chromedriver to collect session cookies in a cloudflare IUAM protected site

Primary LanguagePythonApache License 2.0Apache-2.0

CFSession

A python script utilizing undetected-chromedriver to collect session cookies in a cloudflare IUAM protected site

Build Test Buy Me A Coffee

How it works

It relies on a modified selenium (undetected-chromedriver) to cloak on sites that block selenium based sessions. When a program is able to pass through the IUAM or Captcha verification, it immediately saves the session token to access the site using requests library.

The library wraps around requests library.

Supported request types:

  • GET
  • POST
  • HEAD
  • PUT
  • PATCH
  • DELETE
  • OPTIONS

Read the wiki for a more extensive details

Usage:

Normal Usage:

import CFSession

if __name__ == "__main__": 
    session = CFSession.cfSession()
    res = session.get("https://nowsecure.nl") #A Cloudflare protected site
    print(res.content)

    #Context Manager
    with CFSession.cfSession() as session:
        res = session.get("https://nowsecure.nl")
        print(res.content)

enable headless mode:

session = CFSession.cfSession(headless_mode=True)

How to choose chrome version:

CFSession has *args and **kwargs which simply passes it to uc.Chrome()

from CFSession import cfSession

if __name__ == "__main__": 
    session = cfSession(version_main=95) #pick chrome version 95

You can also use more options from uc.Chrome() and pass it from there

How to modify chrome options:

V1.3.0 now supports a much more easier way of modifying chromeoptions

from CFSession import Options, cfSession
import undetected_chromedriver as uc

options = Options()
options.chrome_options = uc.ChromeOptions()
session = cfSession(options=options)

Error correcting issues:

Sometimes a uc.Chrome() object has to refresh due to errors (e.g. network errors), the program tries to correct these errors by retrying again until a specified amount of attempts has reached, this operation requires recreating the class again. These cause issues where we have to recreate the ChromeOptions as it is not reusable by selenium's standards.

By default, the program resets all user setting to our preferred default setting, however if you have a preferred setting on mind then you can ignore our defaults by setting ignore_defaults = True on Options

options = Options(ignore_defaults=True)

Installation:

python3 -m pip install CFSession

or

pip3 install CFSession

Question:

Why not just scrape fully on selenium?

  • There are some use cases that where some applications rely on a requests library to scrape on websites, while selenium is sensible option to prevent javascript challenges. This library will try and bypass javascript challenges by using selenium as our solver. If the operation is successful, the session cookies are collected so you can access the site just as how you would with requests without the 405 and or 500 anymore.
  • Another point to make is that scraping with a full on web browser is pretty CPU intensive, it would make sense to use requests lib for a much lighter operation. The plan here is to run only the browser once to collect the cookies and then use requests library to scrape the website.

Is this just a requests wrapper? No, it is simply an extension of requests library where it tries to simplify the process of bypassing cloudflare IUAM.

You can directly access the requests.Session object in the cfSession.session attribute

from CFSession import cfSession

session = cfSession()
session.session #<--- A requests.Session object

Disclaimer:

This library was created with the sole purpose of educational purposes only, any rules/laws/ToS broken should only be held at the sole responsibility of the user.