openwpm/OpenWPM

Add support for saving screenshots, page source, and other arbitrary files to unstructured storage providers

englehardt opened this issue · 1 comments

Screenshots, page source, and other files collected in the browser manager process are currently written directly to disk. This worked when OpenWPM only saved data locally, but will not work for the S3Aggregator. Instead, BaseAggregator should include a save_file method. In LocalAggregator we can implement that to save to disk, and in S3Aggregator we can upload to S3.

Updating this comment as #753 removed everything mentioned in the original issue.
Observations:

  • UnstructuredStorageProviders already have an interface suitable for storing a bunch of bytes under a user-defined name
  • The base path for storing is specified at time of object instantiation
  • => There is no more need for a data_directory in the manager params similiar to the database_name name being removed in #753

Paths forward:

  1. Add a second UnstructuredStorageProvider to the StorageController that is responsible for saving unstructured platform data
  2. Expand the UnstructuredStorageProvider interface with a second method that is responsible for saving unstructured platform data

I prefer option 1 as it is inherently more flexible, e.g. this way screenshots can get saved into the cloud while web content just gets saved to disk.