Cloudflare Challenge Page
Closed this issue · 7 comments
With the season around the corner, I was reviewing the changes I made over the summer for #92 wanting to finally tidy it up, get it pushed, and open the PR. However, I seem to be hitting the Cloudflare challenge page during login. I'm not sure if its just me or if kenpom changed something over the offseason. The challenge I'm hitting seems to be the javascript challenge.
Since mechanicalsoup can't execute javascript, I'm not sure there will be a workaround except for moving to something like selenium or trying cloudscraper maybe.
Yeah selenium is definitely not an elegant or lightweight solution and potential overkill. I don't really know much about cloudscraper (just came across it doing some googling), I can try some preliminary experimenting/digging and see if it could be a potential solution. From what I can tell it seems like it was built to be a drop in replacement for the requests
library and supports both a pure python backend to replicate the javascript execution for the challenge and external js engines like node.
Oh that'd be awesome. In the meantime, I did mock out a rewrite of the login function using cloudscraper and it worked, was able to get through and hit the fanmatch page.
Ok after a little trial and error, its a relatively simple drop in fix. The utils
file with cloudscraper looks like this:
import cloudscraper
def login(email, password):
# Create a Cloudscraper session
scraper = cloudscraper.create_scraper()
scraper.get('https://kenpom.com/index.php')
form_data = {
'email': email,
'password': password,
'submit': 'Login!',
}
scraper.post(
'https://kenpom.com/handlers/login_handler.php',
data=form_data,
allow_redirects=True,
)
# Check if login was successful
home_page = scraper.get('https://kenpom.com/')
if 'Logout' not in home_page.text:
raise Exception('Logging in failed - check your credentials.')
return scraper
I still get the same response page error but further navigation works fine and shows me as logged in.
For the FanMatch
class I just had to change this:
Lines 51 to 52 in e28f4b0
To this:
response = browser.get(self.url)
fm = BeautifulSoup(response.content, "html.parser")
And the rest of the functionality can remain as is. I haven't dug too much into the other classes in the library since I mainly just use the fanmatch functionality but I assume it'd be a similarly simple refactor to take a CloudScraper
object instead of a StatefulBrowser
and then dump the return html into a BeautifulSoup
instance.
Yeah I can open one. I'll open one today