/wtd

CLI tool for scraping wiki tables into sql databases

Primary LanguageHTML

wtd

Wiki table downloader

Description

A toy CLI tool for scraping tables off of wikipedia and putting them into sqlite3 databases.

Running

wtd 0.1.0

USAGE:
    wtd <url> [file-name]

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

ARGS:
    <url>          USAGE: wtd https://example.com
    <file-name>    USAGE: wtd https://example.com myDataBase.db

Development

  • Ensure you have sqlite3 installed then try running ./test.sh which will build, test, and insert a few tables into a db

Still in development

This project is missing many features. It likely will not work on all but the most simple of wikipedia pages (those with one table and no tables inside of tables) as many pages have different layouts and formatting that make scraping the data difficult.

  • Removing most of the std::process::exit() calls with custom errors
  • Fix for pages that have multiple tables.
  • Getting titles from table captions or the closest header
  • Fix for tables in tables
  • More tests.