qubole/rubix

FileDownloadRequestChain can get ReadRequest of length > 2GB which causes integer overflow

Closed this issue · 1 comments

All RequestChains have been implemented considering the standard read method of InputStream where client asks for length in a int variable, hence we can never get read of more than 2GB.

But FileDownloadRequestChain is created during parallel warmup phase, in which we wait for readRequests on a file to accumulate for 10sec(default) and then merge the neighbouring reads and trigger FileDownloadRequestChain to download the data. These readRequests can belong to several read requests on InputStream and when accumulated can lead to a contiguous read of more than 2GB. Due to this the length variable in FileDownloadRequestChain will overflow leading to NegativeArraySizeException, see trinodb/trino#3494.

Another issue to fix is the high consumption of memory by FileDownloadRequestChain. Since the readRequests it handles can be of large size, we should not allocate buffers of same size to download the data in one shot. Instead, we should read data in smaller buffer over multiple iterations.

Fixed by #368