Add support for saving screenshots, page source, and other arbitrary files to unstructured storage providers
englehardt opened this issue · 1 comments
englehardt commented
Screenshots, page source, and other files collected in the browser manager process are currently written directly to disk. This worked when OpenWPM only saved data locally, but will not work for the S3Aggregator. Instead, BaseAggregator
should include a save_file
method. In LocalAggregator
we can implement that to save to disk, and in S3Aggregator
we can upload to S3.
vringar commented
Updating this comment as #753 removed everything mentioned in the original issue.
Observations:
- UnstructuredStorageProviders already have an interface suitable for storing a bunch of bytes under a user-defined name
- The base path for storing is specified at time of object instantiation
- => There is no more need for a
data_directory
in the manager params similiar to thedatabase_name
name being removed in #753
Paths forward:
- Add a second UnstructuredStorageProvider to the StorageController that is responsible for saving unstructured platform data
- Expand the UnstructuredStorageProvider interface with a second method that is responsible for saving unstructured platform data
I prefer option 1 as it is inherently more flexible, e.g. this way screenshots can get saved into the cloud while web content just gets saved to disk.