FileDownloadRequestChain can get ReadRequest of length > 2GB which causes integer overflow
Closed this issue · 1 comments
All RequestChains have been implemented considering the standard read
method of InputStream where client asks for length
in a int
variable, hence we can never get read of more than 2GB.
But FileDownloadRequestChain is created during parallel warmup phase, in which we wait for readRequests on a file to accumulate for 10sec(default) and then merge the neighbouring reads and trigger FileDownloadRequestChain
to download the data. These readRequests can belong to several read
requests on InputStream and when accumulated can lead to a contiguous read of more than 2GB. Due to this the length
variable in FileDownloadRequestChain
will overflow leading to NegativeArraySizeException
, see trinodb/trino#3494.
Another issue to fix is the high consumption of memory by FileDownloadRequestChain. Since the readRequests it handles can be of large size, we should not allocate buffers of same size to download the data in one shot. Instead, we should read data in smaller buffer over multiple iterations.
Fixed by #368