Wikipedia Table Web Scraper

Simple web scraper using Beautiful Soup for reading Wikipedia table data into .csv and .json files.

Description:

Created to scrape U.S. county centroid location data from Wikipedia for another project. I ended up updating some features so that it can scrape many different Wikipedia tables.

Installation

Set up your virtual environment and activate it: python3 -m venv env source env/bin/activate Install packages: pip install -r requirements.txt

Usage

Since this scraping program does not operate on machine learning algorithms, you may have to inspect the source of the page you need to scrape and configure the Python code to fit your use case. To run this script:

python3 wiki_scraper.py [OPTIONS]
    -h,--help: open help menu
    -v,--verbose: print verbose output
    -c,--clean: remove all special characters from data
    -o,--output <type>: output data format (csv, json), default output type is csv
    -n,--name: name of the files to output to (minus file extension (will append an int if multiple
    -url,--url: address of the wikipedia page with the table, default links to https://en.wikipedia.org/wiki/User:Michael_J/County_table

calvang/wikipedia-table-web-scraper

Wikipedia Table Web Scraper

Description:

Installation

Usage