Why do you use "sleep()" in your fbref_scout_extraction.py script before every "read_html()"?
dkcracks opened this issue · 2 comments
Big fan of your repos. Just a question regarding using "sleep()".
Thanks!
Thanks for the comment! When I first started on the project, fbref did not have a mechanism in place to prevent people from scraping data from their website.
In recent updates, fbref seems to limit the amount of traffic coming from a specific IP address at a given time (which I think was used to prevent webscraping or other kinds of attacks). I tried a variety of techniques including using sleep()
to spread out the frequency of webscraping done per minute. I have not figured out a solution around this issue. If you have any suggestions, I am happy to listen to them!
@hoyishian thanks for your answer and the repo. You can try using widgets