googleapis/java-storage

Cloud storage v2 API, add support for batch operations

martin-traverse opened this issue · 2 comments

Is your feature request related to a problem? Please describe.

The old storage API (the Storage class in the Java SDK) supports batch operations over http. Our solution is built using the new StorageClient for gRPC operations, because we really wanted client streaming for uploads. The one bit we can't do using the new StorageClient is batch operations, which we use for recursive delete of objects under a prefix. We can list the objects using StorageClient, but the batch delete has to be sent using the old Storage API and only works with http. Since the restriction also appies to the old API using the grpc() version, I assume this is because of a limitation in the gRPC API itself?

Describe the solution you'd like

Ideally it would be good to have batch operations on the new StorageClient API, similar to what exists in the old Storage API. For our solution we only care about batch deletes at present, although it makes sense more generally that batch capabilities that were needed before will still be needed.

What you want to happen

I'd like to see batch operations supported on StorageClient. I'm assuming there is a dependency on adding them in the underlying gRPC APIs, that is just an assumption though! Since I can do 95% of what I need with the new API, it is a shame to still create both clients. If I can use just hte new client, then I only need to worry about one set of resources, handle one set of errors etc.

Describe alternatives you've considered

For now I had to create a legacy Storage object as well as the new StorageClient, and I use the old API just for doing batch delete operations. This does work, but it's not great as a long term solution and means its not possible to provide a full solution on the new API.

Additional context

Our product is an open source data and analytics platform: https://github.com/finos/tracdap

The core platform is built on gRPC and Apache Arrow, using Netty as the transport. Our storage plugin for GCP sits on top of the same resources (event loops, allocators etc). We use client / server streaming to transfer data in pipelines where the format and size of data is not known in advance. Using the old Storage API would involve buffering and worker thread pools which we've managed to avoid elsewhere. The new StorageClient is great for us, because we've already built streaming pipelines on gRPC so we can just follow the same pattern.

I appreciate these APIs are very new, we started using them as soon as they came available! Still the results have been good for us so far. If we can get rid of the need to use the old API at all, that would be ideal.

The storage v2 api, does not currently implement batch operations and its position on the roadmap does not have a public date associated with it.

Hi Ben - thanks again for the quick response. We can carry on using the dual approach for now. Hopefully at some future point batch calls get released in the v2 API , and we can simplify our code then.