hoyishian/footballwebscraper

Why do you use "sleep()" in your fbref_scout_extraction.py script before every "read_html()"?

dkcracks opened this issue · 2 comments

Big fan of your repos. Just a question regarding using "sleep()".

Thanks!

Thanks for the comment! When I first started on the project, fbref did not have a mechanism in place to prevent people from scraping data from their website.

In recent updates, fbref seems to limit the amount of traffic coming from a specific IP address at a given time (which I think was used to prevent webscraping or other kinds of attacks). I tried a variety of techniques including using sleep() to spread out the frequency of webscraping done per minute. I have not figured out a solution around this issue. If you have any suggestions, I am happy to listen to them!

@hoyishian thanks for your answer and the repo. You can try using widgets