In the last year of my University Degree in "Statistical Sciences of Economy and Business" I had the pleasure to help one of my uni friends in the writing of his Thesis.
As he had a very deep passion for sports, he decided to merge his hobby with statistics. Fortunately for him, sport and statistics are in fact very close to each other, especially in the field of betting.
What he wanted to do was trying to evaluate the odds' precision. The way he intended to do it was by comparing Asian Handicaps and Over/Unders used before the match for bets with the final result of the match itself.
Asian handicap betting is a form of betting in which teams are handicapped according to their form so that a stronger team must win by more points for a bet on them to win.
The over/under predicts the combined score of the two teams. Then, the bettors will bet if the combined score would be either more than or less than that number.
For every match, every possible score of both the AH and the O/U is given two odds, one represents the probability that the real AH or O/U will be lower, the other represents the probability that the two will be higher. In this way, the Asian Handicap or the Over/Under for which the two odds are equal (or very near to one another) is the one that the betting market predicts as more probable.
I'll give you an example:
In Game 1 of the 2022 Finals betweet Golden State Warriors and Boston Celtics, the two odds reach the same value (1.90) at the AH -3.5. This means that, for the betting market, it is equally probable that the final score will be more than 3.5 points in favor of GSW or less than 3.5 points in favor of GSW, that is to say that the betting market best prediction is that GSW will win with 3.5 points over Boston.
In the same match, this is the Over/Under odds:
What my friend wanted to do was to compare the AH and O/U in which the two odds reache the same value with the real difference in point (AH) and the real combined score (O/U)
But why was I of any importance in this research?
It didn't take long for my friend to realise that there wheren't any dataset that he could use for is research. At the time I had just finished my Scientific Computing with Python and Data Analysis with Python FreeCodeCamp's certifications, so I was eager to make a real use out of the Python skills I learned.
The scraper uses Selenium and Beautiful Soup to:
- Access oddsportal.com
- Change the time zone
- Loop through the seasons
- Loop through the pages
- For every page use the function scrape_links to scrape, for every match:
- The teams involved
- The final score
- The date
- The category (Pre-Season, Regular-Season, Play-In, Play-Off)
- The links to the odds for the match
- For every page use the function scrape_links to scrape, for every match:
- Loop through the pages
- For every link just scraped use the function scrape_odds to:
- Open the Asian Handicap section, scrape the odds, find the AH for which the two odds are the most near to one another
- Open the Over/Under section, scraoe the odds, find the O/U for which the two odds are the most near to one another
- Scrape the score in every quarter
- Use the function correction to recheck the links that lead to missing values
The resulting dataset provides 15868 observation of 12 variables:
- Names: the teams involved in the match
- Link: the link to the odds of the match
- Score: the final score
- Score.q: the score at the end of every quarter
- OverTime: Boolean, True if the match went to Overtime
- Score.OT: Score at the beginning of the Overtime
- Date: the date of the match
- Category: can be Pre-Season, Regular-Season, Play-In, Play-Off
- AH: the Asian Handicap for which the two odds are the most near to one another
- diff.AH: the difference between the two nearest odds in the AH section
- O/U: the Over/Under for which the two odds are the most near to one another
- diff.OU: the difference between the two nearest odds in the O/U section