/yahoo-groups-backup

A python script to backup the contents of private Yahoo! groups.

Primary LanguagePythonThe UnlicenseUnlicense

yahoo-groups-backup

A python script to backup the contents of Yahoo! groups, be they private or public.

Setup/Requirements

The project requires Python 3, Mongo, and a computer with a GUI as Selenium is used for the scraping (to be able to handle private groups).

virtualenv is recommended.

git clone https://github.com/csaftoiu/yahoo-groups-backup.git
cd yahoo-groups-backup
pip install -r requirements.txt

Example

To scrape an entire site, say the concatenative group:

./yahoo-groups-backup.py scrape_messages concatenative

This will shove all the messages into a Mongo database (default localhost:27017), into the database of the same name as the group.

To scrape the files as well (though this group has no files):

./yahoo-groups-backup.py scrape_files concatenative

To dump the site as a human-friendly, fully static (i.e. viewable from the file system) website:

./yahoo-groups-backup.py dump_site concatenative concatenative_static_site

Then simply open concatenative_static_site/index.html and browse!

Full Usage

Yahoo! Groups backup scraper.

Usage:
  yahoo-groups-backup.py scrape_messages [options] <group_name>
  yahoo-groups-backup.py scrape_files [options] <group_name>
  yahoo-groups-backup.py dump_site [options] <group_name> <root_dir>
  yahoo-groups-backup.py -h | --help

Commands:
  scrape_messages     Scrape all group messages that are not scraped yet and insert it into the mongo database
  scrape_files        Scrape the group files and insert them into the mongo database
  dump_site           Dump the entire backup as a static website at the given root
                      directory

Options:
  -h --help                                   Show this screen
  -d --delay=<delay>                          Delay before scraping each message, to avoid rate limiting.
                                              Delays by a gaussian distribution with average <delay> and
                                              standard deviation <delay>/2. [default: 1]
  -c --config=<config_file>                   Use config file, if it exists [default: settings.yaml]
                                              Command-line settings override the config file settings
                                              "--mongo-host=foo" converts to "mongo-host: foo" in the
                                              config file.
  [(--login=<login> --password=<password>)]   Specify Yahoo! Groups login (required for private groups)
  --mongo-host=<hostname>                     Hostname for mongo database [default: localhost]
  --mongo-port=<port>                         Port for mongo database [default: 27017]