This is a web scraper to get data from UFC Stats, built using Scrapy. Scraped data are organized as follows:
All completed UFC fights:
fight_info
table, contains fight/match-up level meta-data.fighter_stats
table, contains fighter level data of fighters' career summary statistics.fight_stats
contains fighter-level performance data within each match-up.
Upcoming fights:
upcoming
table contains match-up level information of all the upcoming fights in the next UFC event, according to this page http://ufcstats.com/statistics/events/completed.
Let me know if you've used the crawler or data to make something cool 👋
Logs will be written to standard output in json format.
make build # Builds the docker container
make ufcFights # Run the ufcFights crawler
make ufcFighters # Run the ufcFighters crawler
make upcoming # Run the upcoming crawler
- Python 3
- Scrapy
Install required packages.
pip install -r requirements.txt
If you have trouble installing Scrapy, see the install section in Scrapy documentation at https://docs.scrapy.org/en/latest/intro/install.html for more details.
Clone or fork the repo. Or download a local copy. Then crawl away.
Note: in the current version, running the spider will crawl the entire site, so it will take some time.
Call scrapy crawl spider_name
to start the crawler. There are 3 spiders you can run:
scrapy crawl ufcFights
The ufcFights
spider will return
fight_info
table as a.csv
file saved indata/fight_info
directory.fight_stats
table as.jl
file (newline-delimited JSON) saved indata/fight_stats
directory. One line per fight.
If you prefer other output formats, you can modify the respective feed exports pipelines in pipelines.py
. Or file an issue and let me know.
scrapy crawl ufcFighters
The ufcFighters
spider will return the fighter_stats
table as a .csv
file saved in data/fighter_stats
directory.
scrapy crawl upcoming
The upcoming
spider will return upcoming
table as a .csv
file, saved in data/upcoming
directory.
All output files use timestamp as file names, stored in different folders.
- Add a spider to scrape upcoming fights
- Add options to limit the spider's scope, e.g. only scrape the new matches rather than the entire site.