webrecorder/specs

[Use Case]: Researcher saves an article for later use

Closed this issue · 1 comments

Describe a use case for WACZ format.

A researcher wants to save an interactive article they're using browsing for later use in their research.
They’ve bookmarked the page, but want to ensure they have a copy in case it disappears, changes or becomes paywalled. They use the browser to create a web archive (saved in the browser as well) and then download a local copy of the article as a WACZ, just in case. They do not intend to share this article with anyone else.

Additional Requirements

  • List of entry pages to start browsing from
  • Full-text search index
  • Technical metadata about the web archive
  • User-defined descriptive metadata
  • Screenshots of key pages
  • Encryption of data
  • Proof of Authenticity (Signing and Verification)
  • Fast access to multiple WACZ files in aggregate
  • Crawl or capture logs

How will web archives be created for this use case?

  • Manually, using a browser to capture exact content as directed by the user.
  • Automatically, using a crawler to crawl desired content, either once or on a specified schedule.

Sensitive private content and access

  • No, this use case focuses on archiving publicly accessible data only, and web archive can be made public.
  • No, this use case focuses on archiving publicly data only, but web archive is not inteded to be public.
  • Yes, this use case involves archiving data that is not public, and the web archive should not be made public.
edsu commented

This was added to the current use cases document.