Make principal web archive capture optional?
matteocargnelutti opened this issue · 5 comments
Should it be possible to skip the web capture step?
Potential use case: only capturing provenance summary, screenshot, pdf snapshot and video extraction on a given web page?
Is the idea that it would cut down on the amount of storage?
I can't address your question, but wanted to say: Nice to see you here, @edsu!
Hi @edsu!
Is the idea that it would cut down on the amount of storage?
It is more to account for use cases that do not revolve around capturing HTTP exchanges in a WARC.
For example, some users might just want to make a PDF capture or screenshot of a web page using Scoop, and only care about that artifact.
But don't you need to do the HTTP exchanges to generate the screenshot?
@edsu Yes and no.
- Yes: the HTTP exchanges will pass through the proxy as Scoop navigates to the page to take the screenshot
- No: If I am only interested in the screenshot, I don't need to record these HTTP exchanges, and can also skip some intermediate steps, for example some of the browser behaviors.