Breakthrough-Energy/PowerSimData

Blob storage integration

Closed this issue ยท 1 comments

๐Ÿš€

This is an overview of the work related to fully utilizing blob storage as a filesystem. It can be broken into the following steps:

  • implement a pyfilesystem extension library which translates azure's blob sdk to the fs api
  • publish this package (at first, just using github) so it can be used by PowerSimData and downstream packages
  • implement BlobDataAccess class which will allow a user of PowerSimData to load any scenarios from blob storage if available, or users with credentials to upload scenario data (either from a script or as part of the scenario lifecycle). this option was not picked
  • consider adding a layer that combines multiple DataAccess classes based on some priority. For example, we might have a CustomDataAccess class that takes a combination of data sources and whether they are writable, then applies the semantics of this interface to each underlying filesystem. E.g. if we pass as input to this layer {'local': ['read', 'write'], 'blob': ['read'], 'ssh': ['read']} then we would prioritize reads in the order given, and write results only locally. Note: this is just an idea, which may or may not pan out, but serves as an example of possible standardization.
  • combine ssh and blob filesystems using MultiFS
  • upload scenarios to blob storage
    • 599
    • 603
    • etc
  • before enabling writes to blob storage from withing the standard scenario workflow, do the following: not needed at this time
  • determine what protection to apply to existing blobs (should deletion be prevented?)
  • determine how to synchronize writes (can't have concurrent updates to a ScenarioList.csv or equivalent sqlite db)

Blob versioning might help with writes:

When blob versioning is enabled for a storage account, all write operations on block blobs trigger the creation of a new version, with the exception of the Put Block operation.

https://docs.microsoft.com/en-us/azure/storage/blobs/versioning-overview

as well as this discussion:
https://docs.microsoft.com/en-us/azure/storage/blobs/concurrency-manage?tabs=dotnet