This project is a Python application designed to scrape data from Wikipedia regarding the political leaders of various countries. It leverages an external API to obtain a list of countries and their past political leaders, then extracts and sanitizes their short bios from Wikipedia. The scraped data is then saved to a JSON file for further processing.
- Python 3.x
- BeautifulSoup
- Requests
- Clone the repository:
git clone https://github.com/JaggarYussef/wiki-scraper.git
- Navigate to the project directory:
cd wikipedia-scraper
- Install the dependencies:
pip install -r requirements.txt
- Run the
main.py
script to start the scraping process:python main.py
- The script will call the API to retrieve a list of countries and their leaders, scrape the Wikipedia pages for the leaders' bios, and save the data to a JSON file named
leaders_data.json
in the project directory.
main.py
: The main script that orchestrates the scraping process.src/
: Directory containing the source code for the Wikipedia scraper.wiki_scraper.py
: Module containing theScraper
class responsible for scraping Wikipedia data.
requirements.txt
: File listing the project dependencies.leaders_data.json
: Output JSON file containing the scraped data.