google-cloud-php-storage contains critical data corruption in \Google\Cloud\Storage\Connection\Rest->downloadObject()
indreka opened this issue · 0 comments
Environment details
- OS: Linux
- PHP version: 8.1.29
- Package name and version: google-cloud-php-storage:v1.33.1
Steps to reproduce
- try to fetch files from storage bucket when storage.googleapis.com is having intermittent issues and returns server error 500 or 503 and response body like "We encountered an internal error. Please try again." for first request and actual response for next request.
- \Google\Cloud\Storage\Connection\Rest->downloadObject returns mangled file content where the beginning of actual file data is replaced with text "We encountered an internal error. Please try again."
Code example
$content = $storageAdapter->read("path/to/someFile")["contents"];
Problematic code in downloadObject function is following:
$requestOptions['restRetryListener'] = function (
\Exception $e,
$retryAttempt,
&$arguments
) use (
$resultStream,
$requestedBytes,
$invocationId
) {
// if the exception has a response for us to use
if ($e instanceof RequestException && $e->hasResponse()) {
$msg = (string) $e->getResponse()->getBody();
$fetchedStream = Utils::streamFor($msg);
// add the partial response to our stream that we will return
Utils::copyToStream($fetchedStream, $resultStream);
// Start from the byte that was last fetched
$startByte = intval($requestedBytes['startByte']) + $resultStream->getSize();
$endByte = $requestedBytes['endByte'];
// modify the range headers to fetch the remaining data
$arguments[1]['headers']['Range'] = sprintf('bytes=%s-%s', $startByte, $endByte);
$arguments[0] = $this->modifyRequestForRetry($arguments[0], $retryAttempt, $invocationId);
}
};
It tries to handle all RequestException cases by remembering $e->getResponse()->getBody() and assuming that these were the correct bytes and then tries to resume fetching by asking the remaining data (skipping number of bytes already remembered).
As a result, whenever internal server occurs in a way that some content is returned (not just header but actual body), that said content is injected into the file.
We found out by getting intermittent image format exceptions when fetching files from storage bucket during partial outage and suddenly random images got their first 51 bytes replaced with "We encountered an internal error. Please try again.".
We were quite surprised that instead of throwing exception during storage.googleapis.com partial outage, that retry/resume logic started randomly replacing data like this.
Same problematic code exists in current latest 1.44 version:
I understand that the point of that code is to resume file download (useful when there is some network error when 990MB of 1000MB file is downloaded), but any non-network-errors should get ignored, not injected into file content silently.