1. ArchiveBot <SketchCow> Coders, I have a question. <SketchCow> Or, a request, etc. <SketchCow> I spent some time with xmc discussing something we could do to make things easier around here. <SketchCow> What we came up with is a trigger for a bot, which can be triggered by people with ops. <SketchCow> You tell it a website. It crawls it. WARC. Uploads it to archive.org. Boom. <SketchCow> I can supply machine as needed. <SketchCow> Obviously there's some sanitation issues, and it is root all the way down or nothing. <SketchCow> I think that would help a lot for smaller sites <SketchCow> Sites where it's 100 pages or 1000 pages even, pretty simple. <SketchCow> And just being able to go "bot, get a sanity dump" 2. More info For the user's guide, read the COMMANDS file. For a half-assed installation and operation guide, read INSTALL. For a polished installation guide, submit a pull request. 3. License Copyright 2013 David Yip; made available under the MIT license. See LICENSE for details. 4. Acknowledgments Thanks to Alard (@alard), who added WARC generation and Lua scripting to GNU Wget. Wget+lua was the first web crawler used by ArchiveBot. Thanks to Christopher Foo (@chfoo) for wpull, ArchiveBot's current web crawler. Thanks to Ivan Kozik (@ivan) for maintaining ignore patterns and tracking down performance problems at scale. Other thanks go to the following projects: * Celluloid <http://celluloid.io/> * Cinch <https://github.com/cinchrb/cinch/> * CouchDB <http://couchdb.apache.org/> * Ember.js <http://emberjs.com/> * Redis <http://redis.io/> * Seesaw <https://github.com/ArchiveTeam/seesaw-kit> 5. Special thanks Dragonette, Barnaby Bright, Vienna Teng, NONONO. The memory hole of the Web has gone too far. Don't look down, never look away; ArchiveBot's like the wind. vim:ts=2:sw=2:tw=72:et