Conversion from InputStream -> ByteBuffer on gRPC writes creates many byte[] allocations.
Closed this issue · 2 comments
Hi team,
Investigating memory usage in the write path for gRPC; I found that significant allocations are coming from converting InputStream -> ByteBuffer code in the GCS connector: https://storage.googleapis.com/anima-frank/large-writes-grpc/grpc_100_write_100MiB_t_4_profile.html
Note: Workload runs Fsbenchmark uploading 10k 100MiB object across 4 threads in n2-standard-4 GCE using DirectPath.
~72% of allocations come from converting InputStream -> ByteBuffer creates 2 2MiB byte[]'s for every write:
- ByteString.readFrom: ~55% - creates a 2MiB byte[]
- ByteString.toByteArray(): ~18% - creates a 2MiB byte[]
Separately, java-storage does contribute to 19% of allocations, I'm digging into this as well. My current suspicion is that java-storage creates a buffer per upload and not per message.
Update: I attempted to make a change to the code to workaround this issue but overall wall time is not suitable:
For a sequential write of 10k 100MiB objects; existing implementation is takes around ~2+ hours using gRPC DP while my prototype version is still running after ~9+ hours.
@arunkumarchacko could you investigate alternatives?
cc: @schannahalli, @danielduhh
This issue is fixed once we moved away from pipe.