/Wikipedia-Scraper

A Python-based Wikipedia scraper that extracts and processes content from Wikipedia pages for data analysis and research purposes.

Primary LanguageJupyter NotebookMIT LicenseMIT

Wikipedia Scraper

The Python based web scraping tool called the Wikipedia Scraper is specifically created to retrieve information, from Wikipedia pages. The main objective of this project is to offer an effective method to collect data, from Wikipedia articles for different purposes.

Features

  • Easy-to-Use: Simple Python script for quick integration into your projects.
  • Customizable: Adjustable settings for extracting specific information based on your requirements.
  • Robust: Handles various Wikipedia page structures and formats.
  • Data Output: Extracted data can be saved in different formats (e.g., JSON, CSV) for further analysis.

Prerequisites

  • Python 3.8

  • Dependencies List -

  • wikipedia: Python library for accessing and parsing Wikipedia data.

    • Install: pip install wikipedia
  • Requests: HTTP library for making web requests.

    • Install: pip install requests
  • Beautiful Soup: HTML parsing library for pulling data out of HTML and XML files.

    • Install: pip install beautifulsoup4

Acknowledgements

  • Requests: The Requests library made it easy to handle HTTP requests in our scraper.

  • Beautiful Soup: Special thanks to Beautiful Soup for providing a powerful tool for HTML parsing in Python.

  • Wikipedia Package: Gratitude to the developers of the Wikipedia package, which simplified the process of accessing and parsing Wikipedia data in our project.

Contact

Email : miteshgupta2711@gmail.com

Linkedin : https://www.linkedin.com/in/mitesh-gupta/

Twitter : https://twitter.com/mg_mitesh