/Wikipedia-Scraper-Bot

A wikipedia scraper bot made in python.

Primary LanguagePython

Wikipedia-Scraper-Bot

A wikipedia scraper bot made in python. Developed with Python 3.6.0 using Spyder.

Necessary Modules

  • BeautifulSoup
  • Requests
  • Html2Text
  • Validators
  • OS
  • Regex

Installation

First install Python 3.6.0

Clone the repository to your desktop. Run Main.py using CMD or Terminal by using the command python Main.py

To do

  • Create a very basic wikipedia scraper that scrapes the title and the first few paragraphs. We will provide a url and the scraped text will be stored in a text file in a output folder.

  • Create a separate file for the downloaded text

  • Divide the code among various modules

  • Clean the scraped data

  • Add comments to the code

  • Handle the exceptions that could occur

  • Make a reddit bot out of it

  • The Structure will look like=> Main.py, Scraper.py, Cleaner.py,TxtToFile.py, downloadedTxt Folder

The Scraper doesn't work correctly on pages(like Illuminati) that have quotes text in them. Needs to be fixed