Cloudflare Challenge Page

With the season around the corner, I was reviewing the changes I made over the summer for #92 wanting to finally tidy it up, get it pushed, and open the PR. However, I seem to be hitting the Cloudflare challenge page during login. I'm not sure if its just me or if kenpom changed something over the offseason. The challenge I'm hitting seems to be the javascript challenge.

Since mechanicalsoup can't execute javascript, I'm not sure there will be a workaround except for moving to something like selenium or trying cloudscraper maybe.

Ugh, yeah, I was kind of worried about continually running into this given the issues that started really cropping up last season. Selenium is pretty heavy in terms of solution.

…

On Thu, Oct 24, 2024, 12:47 PM Sean Kim ***@***.***> wrote: With the season around the corner, I was reviewing the changes I made over the summer for #92 <#92> wanting to finally tidy it up, get it pushed, and open the PR. However, I seem to be hitting the Cloudflare challenge page during login. I'm not sure if its just me or if kenpom turned on managed challenges over the offseason. The challenge I'm hitting is the javascript challenge <https://developers.cloudflare.com/waf/reference/cloudflare-challenges/>. Since mechanicalsoup can't execute javascript, I'm not sure there will be a workaround except for moving to something like selenium or trying cloudscraper <https://github.com/VeNoMouS/cloudscraper/tree/master> maybe. — Reply to this email directly, view it on GitHub <#93>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACOAQNDPXYHMCQ5MRXP6JYDZ5EXCXAVCNFSM6AAAAABQRTVDA2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGYYTEMRSGM2TMNA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Yeah selenium is definitely not an elegant or lightweight solution and potential overkill. I don't really know much about cloudscraper (just came across it doing some googling), I can try some preliminary experimenting/digging and see if it could be a potential solution. From what I can tell it seems like it was built to be a drop in replacement for the requests library and supports both a pure python backend to replicate the javascript execution for the challenge and external js engines like node.

I may also try to reach out to Ken directly about this. He has answered in the past, so who knows, maybe he'll be able and willing to help us out.

…

On Thu, Oct 24, 2024, 1:01 PM Sean Kim ***@***.***> wrote: Yeah selenium is definitely not an elegant or lightweight solution and potential overkill. I don't really know much about cloudscraper (just came across it doing some googling), I can try some preliminary experimenting/digging and see if it could be a potential solution. From what I can tell it seems like it was built to be a drop in replacement for the requests library and supports both a pure python backend to replicate the javascript execution for the challenge and external js engines like node. — Reply to this email directly, view it on GitHub <#93 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACOAQNCOFWVZODG736RNH63Z5EYW3AVCNFSM6AAAAABQRTVDA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZWGAYTQMBTGQ> . You are receiving this because you commented.Message ID: ***@***.***>

Oh that'd be awesome. In the meantime, I did mock out a rewrite of the login function using cloudscraper and it worked, was able to get through and hit the fanmatch page.

Ok after a little trial and error, its a relatively simple drop in fix. The utils file with cloudscraper looks like this:

import cloudscraper

def login(email, password):

    # Create a Cloudscraper session
    scraper = cloudscraper.create_scraper()

    scraper.get('https://kenpom.com/index.php')

    form_data = {
        'email': email,
        'password': password,
        'submit': 'Login!',
    }

    scraper.post(
        'https://kenpom.com/handlers/login_handler.php',
        data=form_data,
        allow_redirects=True,
    )

    # Check if login was successful
    home_page = scraper.get('https://kenpom.com/')
    if 'Logout' not in home_page.text:
        raise Exception('Logging in failed - check your credentials.')

    return scraper

I still get the same response page error but further navigation works fine and shows me as logged in.

For the FanMatch class I just had to change this:

kenpompy/kenpompy/FanMatch.py

Lines 51 to 52 in e28f4b0

    
           browser.open(self.url) 
        
           fm = browser.get_current_page()

To this:

response = browser.get(self.url)
fm = BeautifulSoup(response.content, "html.parser")

And the rest of the functionality can remain as is. I haven't dug too much into the other classes in the library since I mainly just use the fanmatch functionality but I assume it'd be a similarly simple refactor to take a CloudScraper object instead of a StatefulBrowser and then dump the return html into a BeautifulSoup instance.

Any chance you wanna pop in a PR even if it's not quite all the way there?

…

On Thu, Oct 24, 2024, 4:23 PM Sean Kim ***@***.***> wrote: Ok after a little trial and error, its a relatively simple drop in fix. The utils file with cloudscraper looks like this: import cloudscraper def login(email, password): # Create a Cloudscraper session scraper = cloudscraper.create_scraper() scraper.get('https://kenpom.com/index.php') form_data = { 'email': email, 'password': password, 'submit': 'Login!', } scraper.post( 'https://kenpom.com/handlers/login_handler.php', data=form_data, allow_redirects=True, ) # Check if login was successful home_page = scraper.get('https://kenpom.com/') if 'Logout' not in home_page.text: raise Exception('Logging in failed - check your credentials.') return scraper I still get the same response page error but further navigation works fine and shows me as logged in. For the FanMatch class I just had to change this: https://github.com/j-andrews7/kenpompy/blob/e28f4b0d5376415781de70efcf0f7918ec2c0428/kenpompy/FanMatch.py#L51-L52 To this: response = browser.get(self.url)fm = BeautifulSoup(response.content, "html.parser") And the rest of the functionality can remain as is. I haven't dug too much into the other classes in the library since I mainly just use the fanmatch functionality but I assume it'd be a similarly simple refactor to take a CloudScraper object instead of a StatefulBrowser and then dump the return html into a BeautifulSoup instance. — Reply to this email directly, view it on GitHub <#93 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACOAQND37DVMCPEMXHQNL4TZ5FQNHAVCNFSM6AAAAABQRTVDA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZWGM3DCNBWGU> . You are receiving this because you commented.Message ID: ***@***.***>

Yeah I can open one. I'll open one today