A collection of tools to help with the impending Google Reader shutdown.
Comprehensive archive of a Google Reader account.
Unlike Google Reader's Takeout support, provides a complete archive of a Reader account's data. This includes:
- All your read items
- All your starred items
- All your tagged items
- All your shared items
- All the shared items from the people you were following.
- All the comments on shared items
- All your liked items
- All items you've kept unread, emailed, read on your phone, clicked on or otherwise interacted with.
- All items that have appeared in one of your subscriptions
- All items that were recommended to you
- All items in the (English) "Explore" section
- All the profiles of the people you were following before the sharepocalypse.
- All your preferences.
To use it:
bin/reader_archive --output_directory=~/Downloads/reader_archive
A browser window will appear asking you to authorize the app. Once you do, you'll be given a code to paste back into the terminal (you can also use the --use_client_login
flag to instead be prompted for your Google Account username and password). See
this wiki page for an explanation of the archive format. The intent is to be comprehensive, such that other tools that use the archive data may be created.
The archiving process can take a while, depending on the size of your account and your internet connection. For an account with 300,000 read items, the process took about 10 minutes and generated 1 GB of data.
Browse an archived Google Reader account.
Takes an archive generated by reader_archive
and provides a browsing UI for it.
To use it:
bin/reader_browser ~/Downloads/reader_archive
Then you can load http://localhost:8071/ in your web browser to see the contents of the archive.
Saves public feed data from Google Reader's feed archive.
Google Reader has (for the most part) a copy of all blog posts and other feed items published since its launch in late 2005 (assuming that at least one Reader user subscribed to the feed). This makes it an invaluable resource for sites that disappear, can serve as a backup mechanism and enables tools to be created.
Presumably access to this data is also going away come July 2013, and thus this tool can be used to get one last shot at archiving feeds you might want to refer to later.
The easiest way to use it is get the OPML file with all your Reader subscriptions, and run it like so:
bin/feed_archive \
--opml_file=~/Downloads/feeds.opml \
--output_directory=~/Downloads/feed_archive
The destination specified by --output_directory
will be populated with one file per feed, named after its URL. The file contains all items that Reader ever saw in that feed, in the Atom format. Google Reader normally omits unknown (namespaced) elements in its API output, but in the script makes an attempt to use high-fidelity mode to reconstruct the original data as much as possible.
If you have specific feeds you'd like to save the archive for, instead of --opml_file
you can also pass in feed URLs as command line arguments:
bin/feed_archive \
--output_directory=~/Downloads/feed_archive \
http://googlereader.blogspot.com/atom.xml \
http://persistent.info/atom.xml \
...
The tool supports additional arguments for controlling how many items are fetched, see bin/feed_archive --help
for more information.