zendframework/zend-db

Allowing columns with spaces, dash and numbers

Opened this issue · 7 comments

Hello, we have been using Zend-Db as the database abstraction layer for our Directus framework and there has been a significant bug where columns with dashes (e-mail), spaces first name and starting with numbers (3dImage) creates a malformed query.

All issues described here are experienced on MySQL using PDO.

This is related to #8 which is SqlServer.

Examples of Select:

SELECT `users`.`id` AS `id`, `users`.`3dImage` AS `3dImage`, `users`.`e``-``mail` AS `e-mail`, `users`.`first` `name` AS `first name`
FROM `users`

This was hot-fixed by changing:

return $isIdentifier
                ? $fromTable . $platform->quoteIdentifierInFragment($column)
                : $platform->quoteValue($column);

to this:

$startsWithNumber = preg_match('/^[0-9]+/', $column);
$hasSpaceOrDash = preg_match('/^(.+)(\s|-)+(.+)$/i', $column);
if ($isIdentifier) {
    if ($startsWithNumber || $hasSpaceOrDash) {
        $column = $platform->quoteIdentifier($column);
    } else {
        $column = $platform->quoteIdentifierInFragment($column);
    }
}

return $isIdentifier
                ? $fromTable . $column
                : $platform->quoteValue($column);

When inserting or updating with a column starting with number you get: A non well formed numeric value encountered error at line 293 in /src/Adapter/Driver/Pdo/Statement.php

This will try to use a parameter named :3dImage which results in: A non well formed numeric value encountered.

Both dash and space have the same error: Statement could not be executed (HY093 - - ).

I hot-fixed this by changing src/Sql/Insert.php#L161-L176 from:

$columns = [];
$values  = [];
foreach ($this->columns as $column=>$value) {
    $columns[] = $platform->quoteIdentifier($column);
    if (is_scalar($value) && $parameterContainer) {
        $values[] = $driver->formatParameterName($column);
        $parameterContainer->offsetSet($column, $value);
    } else {
        $values[] = $this->resolveColumnValue(
            $value,
            $platform,
            $driver,
            $parameterContainer
        );
    }
}

To this:

$columns = [];
$values  = [];
$cIndex = 0;
foreach ($this->columns as $column => $value) {
    $columns[] = $platform->quoteIdentifier($column);
    $parameterName = 'column' . $cIndex++;
    if (is_scalar($value) && $parameterContainer) {
        $values[] = $driver->formatParameterName($parameterName);
        $parameterContainer->offsetSet($parameterName, $value);
    } else {
        $values[] = $this->resolveColumnValue(
            $value,
            $platform,
            $driver,
            $parameterContainer
        );
    }
}

Also, src/Sql/Update.php#L143-L157 was change from:

$setSql = [];
foreach ($this->set as $column => $value) {
    $prefix = $platform->quoteIdentifier($column) . ' = ';
    if (is_scalar($value) && $parameterContainer) {
        $setSql[] = $prefix . $driver->formatParameterName($column);
        $parameterContainer->offsetSet($column, $value);
    } else {
        $setSql[] = $prefix . $this->resolveColumnValue(
            $value,
            $platform,
            $driver,
            $parameterContainer
        );
    }
}

To this:

$setSql = [];
$pIndex = 0;
foreach ($this->set as $column => $value) {
    $prefix = $platform->quoteIdentifier($column) . ' = ';
    if (is_scalar($value) && $parameterContainer) {
        $parameterName = 'set' . $pIndex++;
        $setSql[] = $prefix . $driver->formatParameterName($parameterName);
        $parameterContainer->offsetSet($parameterName, $value);
    } else {
        $setSql[] = $prefix . $this->resolveColumnValue(
            $value,
            $platform,
            $driver,
            $parameterContainer
        );
    }
}

What it does is stop using the column name as parameter. Instead it uses a constant column and a index value so instead of name, age, country it uses column0, column1, column2 and it prevents the use of spaces, dashes and numbers at the beginning of the column parameters.

We love Zend-DB and we have been thinking about creating a PR, but we need your input and feedback on this solution.

Thank you!

Welling Guzmán and The Directus Team

Great to see other projects loving Zend-DB - humbling to be used by such large repository. Thank you for detailed report and willing to assist.

PR reviews may not be very fast here, but for when they do, will be easier to make feedback if using the github's PR diff code review interface, and give everyone opportunity to clone your branch to try it locally in their IDEs, run tests against all database vendors, like that report for SQL server, etc.. So you are welcome to create one, and feedback can be gathered there :). I do not know if you have time to add tests to PR, but if you do not, I can volunteer (eventually) to add them.

Thank you @wellingguzman to this report.

I am upgrading an app from php5.4 to php7.1 and I upgraded Zend db from 2.2 to 2.8.2 version.
Our database contains many column names with dash.

I would be grateful if someone could inform us an approximate date of the next release fixing this issue if possible.

I addressed this issue, but only for Select part, in #249

@metalinspired Thank you very much for taking the time to try to solve this issue.

Thank you very much for the SELECT fix @metalinspired – hopefully we can get the rest fixed soon!

A thing to note is that Zend\Db is capable of using any character as column name but they break that capability by introducing "magic" that checks if user has entered column name in form of:

table.column as alias

You can see what is happening here.
Personally I dislike this behavior and would replace this with something like this:

$select->columns(['alias' => ['table' => 'column']]);

Expression "suffers" from same behavior.
I am willing to make these changes myself but I haven't been able to get the attention of anyone who has better understanding of Zend\Db than myself in order to discuss the changes and best approach.

Edit: I forgot to mention that mentioned behavior could be useful in Expression but IMHO introducing a new expression type, maybe TYPE_FRAGMENTED, for it would be a better choice and leaving TYPE_IDENTIFIER be "just quote whatever is inside" type.

This repository has been closed and moved to laminas/laminas-db; a new issue has been opened at laminas/laminas-db#84.