/octa-format

Database format for storing crawling traces

MIT LicenseMIT

OCTA Format

OCTA (Online Crawling Trace Archive) is a database format for storing the complete history ("trace") of Web crawling sessions.

The format is designed to enable online, concurrent operation for multiple parties while the crawler is still running. It also aims to be open, portable, tolerant of incomplete data, extensible, while also providing support for large binary content, compression, deduplication and arbitrary annotations.

Details can be found in the Octa Format document.

License: MIT