j-andrews7/kenpompy

cannot login to kenpom.com using login function from kenpompy.utils

Closed this issue · 4 comments

Hi Jared,

I unable to login to kenpom.com using your suggested login method:
from kenpompy.utils import login
browser = login(your_email, your_password)

After pip installing kenpompy and running the above code, I am seeing the attached error. I believe the problem is that the MechanicalSoup stateful_browser.py is not finding any forms on the kenpom site. I have written in a form counter into the stateful_browser.py select_form() function, and you can see the form count results at the top of the output attached.

Do you have any suggestions to resolve this issue?

I am really excited to use your kenpompy project! I am looking to develop a predictive model for the upcoming ncaab season, and your project had the exact kenpom.com web scrapper functions that I need!

Thank you in advance!
Josh

jupyter_Screenshot_2022-10-07 071556

I can replicate this. Not really sure what's going on, the form is definitely still there and all.

This will take some additional digging with beautifulsoup, I'll try to get to it in the next few weeks.

esqew commented

Unfortunately it appears recently KenPom has gone behind CloudFlare DDOS protection. Attempting a login with MechanicalSoup produces this response body:

<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]--><!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]--><!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]--><!--[if gt IE 8]><!--><html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Attention Required! | Cloudflare</title>
<meta charset="utf-8"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<meta content="noindex, nofollow" name="robots"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<link href="/cdn-cgi/styles/cf.errors.css" id="cf_styles-css" rel="stylesheet"/>
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" /><![endif]-->
<style>body{margin:0;padding:0}</style>
<!--[if gte IE 10]><!-->
<script>
  if (!navigator.cookieEnabled) {
    window.addEventListener('DOMContentLoaded', function () {
      var cookieEl = document.getElementById('cookie-alert');
      cookieEl.style.display = 'block';
    })
  }
</script>
<!--<![endif]-->
</head>
<body>
<div id="cf-wrapper">
<div class="cf-alert cf-alert-error cf-cookie-error" data-translate="enable_cookies" id="cookie-alert">Please enable cookies.</div>
<div class="cf-error-details-wrapper" id="cf-error-details">
<div class="cf-wrapper cf-header cf-error-overview">
<h1 data-translate="block_headline">Sorry, you have been blocked</h1>
<h2 class="cf-subheadline"><span data-translate="unable_to_access">You are unable to access</span> kenpom.com</h2>
</div><!-- /.header -->
<div class="cf-section cf-highlight">
<div class="cf-wrapper">
<div class="cf-screenshot-container cf-screenshot-full">
<span class="cf-no-screenshot error"></span>
</div>
</div>
</div><!-- /.captcha-container -->
<div class="cf-section cf-wrapper">
<div class="cf-columns two">
<div class="cf-column">
<h2 data-translate="blocked_why_headline">Why have I been blocked?</h2>
<p data-translate="blocked_why_detail">This website is using a security service to protect itself from online attacks. The action you just performed triggered the security solution. There are several actions that could trigger this block including submitting a certain word or phrase, a SQL command or malformed data.</p>
</div>
<div class="cf-column">
<h2 data-translate="blocked_resolve_headline">What can I do to resolve this?</h2>
<p data-translate="blocked_resolve_detail">You can email the site owner to let them know you were blocked. Please include what you were doing when this page came up and the Cloudflare Ray ID found at the bottom of this page.</p>
</div>
</div>
</div><!-- /.section -->
<div class="cf-error-footer cf-wrapper w-240 lg:w-full py-10 sm:py-4 sm:px-8 mx-auto text-center sm:text-left border-solid border-0 border-t border-gray-300">
<p class="text-13">
<span class="cf-footer-item sm:block sm:mb-1">Cloudflare Ray ID: <strong class="font-semibold">7581eb2e8e0c0ced</strong></span>
<span class="cf-footer-separator sm:hidden">•</span>
<span class="cf-footer-item hidden sm:block sm:mb-1" id="cf-footer-item-ip">
      Your IP:
      <button class="cf-footer-ip-reveal-btn" id="cf-footer-ip-reveal" type="button">Click to reveal</button>
<span class="hidden" id="cf-footer-ip">[redacted]</span>
<span class="cf-footer-separator sm:hidden">•</span>
</span>
<span class="cf-footer-item sm:block sm:mb-1"><span>Performance &amp; security by</span> <a href="https://www.cloudflare.com/5xx-error-landing" id="brand_link" rel="noopener noreferrer" target="_blank">Cloudflare</a></span>
</p>
<script>(function(){function d(){var b=a.getElementById("cf-footer-item-ip"),c=a.getElementById("cf-footer-ip-reveal");b&&"classList"in b&&(b.classList.remove("hidden"),c.addEventListener("click",function(){c.classList.add("hidden");a.getElementById("cf-footer-ip").classList.remove("hidden")}))}var a=document;document.addEventListener&&a.addEventListener("DOMContentLoaded",d)})();</script>
</div><!-- /.error-footer -->
</div><!-- /#cf-error-details -->
</div><!-- /#cf-wrapper -->
<script>
  window._cf_translation = {};
  
  
</script>
</body>
</html>
esqew commented

Setting MechanicalSoup to use an explicit User-Agent string seems to resolve this, at least for now (#25 raised)... although if the sensitivity on this mechanism is turned up at any time more evasive maneuvers may be required.