googleapis/google-cloud-php

Doesn't return all rows, only the first "package"

ser-sergeev opened this issue · 6 comments

I don't know the exact size of "package", but when I'm reading more that 50k it doesn't return all result, only ~22k in my case.
I tried to read 500 million and I saw on a network monitoring that the data is coming but on a cycle I've got only first 22k.

On version google/cloud-spanner: 1.75.0 that problem doesn't exist.

It is breaking current working projects.

Environment details

  • OS: Ubuntu 22.04
  • PHP version: 8.2.18
  • Package name and version: google/cloud-spanner: 1.76.1

Steps to reproduce

  • make a table with 32 bytes columns with 100k rows or more
  • read more than 50000 rows
  • it will return only ~22k rows

Code example

<?php
require 'vendor/autoload.php';
use Google\Cloud\Spanner\SpannerClient;

$projectId = '';
$keyFilePath = '';
$spanner = new SpannerClient([
    'projectId' => $projectId,
    'keyFilePath' => $keyFilePath,
]);
$instanceId = '';
$instance = $spanner->instance($instanceId);
$databaseId = '';
$database = $instance->database($databaseId);

$sql = <<<SQL
SELECT hash
    FROM users
LIMIT 100000
SQL;

$i = 0;
$results = $database->execute($sql);
foreach ($results as $result) {
    $i++;
}
echo $i;

@taka-oyama сould you please look at this too

diff --git a/vendor/google/cloud-spanner/src/Result.php b/vendor/google/cloud-spanner/src/Result.php
--- a/vendor/google/cloud-spanner/src/Result.php
+++ b/vendor/google/cloud-spanner/src/Result.php
@@ -440,7 +440,7 @@
             $this->resumeToken = $result['resumeToken'];
         }
 
-        if (isset($result['metadata'])) {
+        if (isset($result['metadata']) && !isset($this->metadata)) {
             $this->columnNames = [];
             $this->columns = [];
             $this->columnCount = 0;

@saranshdhingra FYI

Hi @ser-sergeev
Thanks for filling the issue.

We have tried fetching about 1M rows and then iterating over the results(to invoke the generator) but we don't see any issues.

Thank you for supplying your code snippet. Can you also post the shema of the table you are querying on.

Thanks.

Hi @saranshdhingra

<?php
require 'vendor/autoload.php';
use Google\Cloud\Spanner\SpannerClient;
$spanner = new SpannerClient([
    'projectId' => '',
    'keyFilePath' => '',
]);
$instance = $spanner->instance('');
$database = $instance->database('');
$results = $database->execute('select hash from users');
$i = 0;
foreach ($results as $result) {
    $i++;
}
echo $i . PHP_EOL;
create table users
(
    hash  BYTES(32)
);

The problem 100% exists. It can be proved by code.
Here \Grpc\ServerStreamingCall::responses

    public function responses()
    {
        $batch = [OP_RECV_MESSAGE => true];
        if ($this->metadata === null) {
            $batch[OP_RECV_INITIAL_METADATA] = true;
        }
        $read_event = $this->call->startBatch($batch);
        if ($this->metadata === null) {
            $this->metadata = $read_event->metadata;
        }
        $response = $read_event->message;
        while ($response !== null) {
            yield $this->_deserializeResponse($response);
            $response = $this->call->startBatch([
                OP_RECV_MESSAGE => true,
            ])->message;
        }
    }

Here we have $batch[OP_RECV_INITIAL_METADATA] = true; It means that only first batch will have metadata, the next wouldn't be, and that is why here \Google\Cloud\Spanner\Result::rows

    public function rows($format = self::RETURN_ASSOCIATIVE)
    {
        $bufferedResults = [];
        $call = $this->call;
        $shouldRetry = false;
        $isResultsYielded = false;

        $valid = $this->createGenerator();

        while ($valid) {
            try {
                $result = $this->generator->current();
                $bufferedResults[] = $result;
                $this->setResultData($result, $format);

                $empty = false;
                if (!isset($result['values']) || $this->columnCount === 0) {
                    $empty = true;
                }

                $hasResumeToken = $this->isSetAndTrue($result, 'resumeToken');
                if ($hasResumeToken || count($bufferedResults) >= self::BUFFER_RESULT_LIMIT) {
                    $chunkedResult = null;
                    if (!$empty) {
                        list($yieldableRows, $chunkedResult) = $this->parseRowsFromBufferedResults($bufferedResults);

                        foreach ($yieldableRows as $row) {
                            $isResultsYielded = true;
                            yield $this->mapper->decodeValues($this->columns, $row, $format);
                        }
                    }
....
                }

                // retry without resume token when results have not yielded
                $shouldRetry = !$isResultsYielded || $hasResumeToken;
                $this->generator->next();
                $valid = $this->generator->valid();
            } catch (ServiceException $ex) {
....
            }
        }
...
    }

columnCount would be zero and on the generator wouldn't be any new rows, expect that came from first batch.

And my patch that I put above would fix it this issue. I use metadata from first batch for next batches without metadata.

This code runs at every new batches, and because of there wiuldn't be any metadata the code above would skip that batch.
\Google\Cloud\Spanner\Result::setResultData

        $this->stats = $result['stats'] ?? null;

        if ($this->isSetAndTrue($result, 'resumeToken')) {
            $this->resumeToken = $result['resumeToken'];
        }

        if (isset($result['metadata'])) {
            $this->columnNames = [];
            $this->columns = [];
            $this->columnCount = 0;
            $this->metadata = $result['metadata'];
            $this->columns = $result['metadata']['rowType']['fields'];

            foreach ($this->columns as $key => $column) {
                $this->columnNames[] = $this->isSetAndTrue($column, 'name')
                    ? $column['name']
                    : $key;
                $this->columnCount++;
            }

            if ($format === self::RETURN_ASSOCIATIVE
                && $this->columnCount !== count(array_unique($this->columnNames))
            ) {
                throw new \RuntimeException(
                    'Duplicate column names are not supported when returning' .
                    ' rows in the associative format. Please consider aliasing' .
                    ' your column names.'
                );
            }
        }

        $this->setSnapshotOrTransaction($result);

@saranshdhingra Do you have any plans to fix it?

Hi @ser-sergeev
I can replicate the issue now. Thanks a lot for the schema and detailed code snippets.

I can also confirm that your code changes in Result.php does fix the issue but I don't think this fixes the underlying cause.

This is also evident from the fact that the presence of metadata in the first call only has been true for a long time.

I think I have found a solution which is related to the changes made in 1.76.0. I'll just test it and release the fix if everything looks good.

The PR is merged, so we should expect this fix in the next release. Closing this issue for now, but please feel to reopen this if this issue or something similar persists.