Wikipedia Scraper

Overview

The Wikipedia Scraper is a Python script that allows you to scrape content from Wikipedia pages and save it to a Word document. This script is particularly useful for researchers, students, or anyone who wants to collect information from Wikipedia articles for offline use.

Requirements

Before running the script, make sure you have the following Python libraries installed:

  • requests: To make HTTP GET requests to Wikipedia pages.
  • BeautifulSoup: To parse HTML content.
  • docx: To create and save Word documents.

You can install these libraries using pip:

pip install requests beautifulsoup4 python-docx

## Usage
Clone this repository to your local machine or download the wikipedia_scraper.py file.

Open a terminal or command prompt and navigate to the directory containing wikipedia_scraper.py.

Run the script using Python:

python wikipedia_scraper.py

The script will start scraping content from the Main Page of English Wikipedia by default. You can change the url variable in the script to specify a different Wikipedia page if needed.

The scraped content will be saved to a Word document named wikipedia_scraped.docx in the same directory as the script.

Customization
You can customize the script to suit your needs:

Change the url variable to scrape content from a different Wikipedia page.
Modify the script to save the output with a different filename or in a different format.
Adjust the parsing logic to extract specific information from Wikipedia pages.
Note
Please be mindful of Wikipedia's terms of use and consider the legality and ethical implications of scraping content from the site. Make sure to respect copyright and licensing restrictions.

License
This code is provided under the MIT License.

Feel free to use, modify, and distribute this code as needed. If you find it helpful or have suggestions for improvements, please contribute or provide feedback.

Happy scraping!


This README file provides an overview of the code, lists requirements, explains how to use and customize the script, and includes important notes and a license section. You can customize it further to suit your specific project needs.