j-andrews7/kenpompy

Cloudflare Challenge Page

Closed this issue · 7 comments

With the season around the corner, I was reviewing the changes I made over the summer for #92 wanting to finally tidy it up, get it pushed, and open the PR. However, I seem to be hitting the Cloudflare challenge page during login. I'm not sure if its just me or if kenpom changed something over the offseason. The challenge I'm hitting seems to be the javascript challenge.

Since mechanicalsoup can't execute javascript, I'm not sure there will be a workaround except for moving to something like selenium or trying cloudscraper maybe.

Yeah selenium is definitely not an elegant or lightweight solution and potential overkill. I don't really know much about cloudscraper (just came across it doing some googling), I can try some preliminary experimenting/digging and see if it could be a potential solution. From what I can tell it seems like it was built to be a drop in replacement for the requests library and supports both a pure python backend to replicate the javascript execution for the challenge and external js engines like node.

Oh that'd be awesome. In the meantime, I did mock out a rewrite of the login function using cloudscraper and it worked, was able to get through and hit the fanmatch page.

Ok after a little trial and error, its a relatively simple drop in fix. The utils file with cloudscraper looks like this:

import cloudscraper

def login(email, password):

    # Create a Cloudscraper session
    scraper = cloudscraper.create_scraper()

    scraper.get('https://kenpom.com/index.php')

    form_data = {
        'email': email,
        'password': password,
        'submit': 'Login!',
    }

    scraper.post(
        'https://kenpom.com/handlers/login_handler.php',
        data=form_data,
        allow_redirects=True,
    )

    # Check if login was successful
    home_page = scraper.get('https://kenpom.com/')
    if 'Logout' not in home_page.text:
        raise Exception('Logging in failed - check your credentials.')

    return scraper

I still get the same response page error but further navigation works fine and shows me as logged in.

For the FanMatch class I just had to change this:

browser.open(self.url)
fm = browser.get_current_page()

To this:

response = browser.get(self.url)
fm = BeautifulSoup(response.content, "html.parser")

And the rest of the functionality can remain as is. I haven't dug too much into the other classes in the library since I mainly just use the fanmatch functionality but I assume it'd be a similarly simple refactor to take a CloudScraper object instead of a StatefulBrowser and then dump the return html into a BeautifulSoup instance.

Yeah I can open one. I'll open one today