Game Score Scrapping is a Python-based project designed to extract and compile game scores from various gaming websites. The primary goal is to convert a dataset of game titles from a PDF file into a comprehensive dataset containing game scores and source links.
- Introduction
- Installation
- Usage
- Features
- Dependencies
- Configuration
- Documentation
- Examples
- Troubleshooting
- Contributors
- License
- Clone the repository:
git clone https://github.com/thaoquynh0603/game-score-scrapping.git
- Navigate to the project directory:
cd game-score-scrapping
- Install the required dependencies:
pip install -r requirements.txt
- Place the input PDF (containing game titles) in the project directory.
- Run the
convert_pdf.ipynb
notebook to convert the PDF into a dataset. - Execute the scraping scripts (
gamespot.py
,ign.py
,metacritic.py
) to fetch game scores. - Combine the results using
combine.py
.
- Convert PDF of game titles to a pandas DataFrame.
- Scrape game scores from GameSpot, IGN, and Metacritic.
- Generate a final dataset with game titles, scores, and source links.
- Python 3.x
- Pandas
- BeautifulSoup4
- Requests
- Ensure the PDF file is named correctly as per the script requirements.
- Update the scraping scripts if necessary to match any changes in the website structures.
convert_pdf.ipynb
: Notebook for converting PDF to dataset.gamespot.py
,ign.py
,metacritic.py
: Scripts for scraping game scores.combine.py
: Script to merge the scraped data into a single dataset.
- Input PDF:
Creative-Arcades-6296.pdf
- Output CSV:
finalgamespot.csv
,finalign.csv
,finalmetacritic.csv
,data.csv
- Ensure all dependencies are installed.
- Check for updates in the website structure if scraping fails.
- Verify the format of the input PDF.