This project is a sub-module for Multiplayer Football Draft Simulator.
A web-crawler to scrape all football players' information from Sofifa and exporting it to JSON format. Perform data cleaning and analytics on the obtained data
- Crawler: Built on scrapy using python3
- Analytics: IPynb noteboook python3
Further exported to the Football Draft Backend to serve from an endpoint
-
Install project dependencies
pip install -r requirements.txt
-
Run the crawler with ./fifa-crawler as current directory (This the main scrapy crawler directory)
Make sure to change the filenames to read and write appropriately:
players_url.json
--> scraping urls
players_stats_raw.json
--> scraping player stats- First run the URL spider (To get all players urls)
scrapy crawl players_url
- After successfull, run the stats spider (To get the players statistics from URLs from above)
scrapy crawl players_stats
- First run the URL spider (To get all players urls)
- Update the crawler periodically to reflect changes on Sofifa platform.
- Add analysis projects on the crawled data.
- Update the crawler to perform scraping to obtain Teams data (currently player-data)
- Improve speed of the crawler
We love your input! We want to make contributing to this project as easy and transparent as possible, whether it's:
- Reporting a bug
- Discussing the current state of the code
- Submitting a fix
- Proposing new features
-
Fork the repo and clone it on your machine.
-
Add a upstream link to main branch in your cloned repo
git remote add https://github.com/sauravhiremath/fifa-stats-crawler.git
-
Keep your cloned repo upto date by pulling from upstream (this will also avoid any merge conflicts while committing new changes)
git pull upstream master
-
Create your feature branch
git checkout -b <feature-name>
-
Commit all the changes
git commit -am "Meaningful commit message"
-
Push the changes for review
git push origin <branch-name>
-
Create a PR from our repo on Github.
- Code should be properly commented to ensure it's readability.
- If you've added code that should be tested, add tests as comments.
- In python use docstrings to provide tests.
- Make sure your code properly formatted.
- Issue that pull request!
When you are creating an issue, make sure it's not already present. Furthermore, provide a proper description of the changes. If you are suggesting any code improvements, provide through details about the improvements.
Great Issue suggestions tend to have:
- A quick summary of the changes.
- In case of any bug provide steps to reproduce
- Be specific!
- Give sample code if you can.
- What you expected would happen
- What actually happens
- Notes (possibly including why you think this might be happening, or stuff you tried that didn't work)
More step by step guide with pictures for creating a pull request can be found here