j4mie/idiorm

findMany() returns only the last record in a set

risingfish opened this issue · 55 comments

I cloned the database one week ago and the latest version of iodiorm would only return the last record in a set.

The example I used to prove it out was to create table called users and insert 16 users. Using the follow code I could only ever return the last record. A friend of mine working in the same code base confirmed the issue. I am on Linux and he is on OSX.

\ORM::configure('mysql:host=localhost;dbname=------');
\ORM::configure('username', '----');
\ORM::configure('password', '----');
\ORM::configure('logging', true);

$people = \ORM::forTable('users')->findMany();

I have a little more info on Stackoverflow and this was all done using the latest code from Git.

I can confirm this behaviour, the important caveat being that it only happens on a cloned DB.

In my case I used PhpMyAdmin to copy a mysql database.

findMany() only returns the last record

I have tried the following code with the latest Idiorm code in master:

include 'idiorm.php';
ORM::configure('mysql:host=127.0.0.1;dbname=my_db_name');
ORM::configure('username', 'my_username');
ORM::configure('password', 'my_password');
ORM::configure('logging', true);
$records = ORM::for_table('my_table_name')->findMany();
var_dump($records); // 185 records as an array - there are only 185 records in the table

I have also tried the same code on a table that contains 8,000 records and a total of 1.5MB of data - it works as expected.

Do you guys have any further details on the issue? Please could you mention PHP version, MySQL version, etc. Any logging information? Have you tried running the queries in an isolated manner like I have above rather than inside your code base?

It is also working as anticipated for me with the ORM configured to return result sets:

include 'idiorm.php';
ORM::configure('mysql:host=127.0.0.1;dbname=my_db_name');
ORM::configure('username', 'my_username');
ORM::configure('password', 'my_password');
ORM::configure('logging', true);
ORM::configure('return_result_sets', true);
$records = ORM::for_table('my_table_name')->findMany();
var_dump($records); // 185 records as a result set object - there are only 185 records in the table

Cloned DB caveat

It only happens on a cloned DB? Are you using multiple connections in Idiorm when you encounter this? I cannot see any reason - presuming it is a true facsimile of the original DB - that Idiorm would behave any differently. Idiorm is completely unaware that it is even operating on a clone.

So either the DB is not an exact clone (think encodings, engine, data loss, etc as well) or there is something different in the way you are using Idiorm between the two DBs.

I'm going to do some more investigation in a cleaner environment before I report back further.

I apologize for not including environment info, here's the server it's currently running on:

Ubuntu Server 12.04 LTS
PHP: Version 5.3.10-1ubuntu3.8 (what comes with 12.04 LTS)
mysql: Ver 14.14 Distrib 5.5.32, for debian-linux-gnu (x86_64) using readline 6.2 (what comes with 12.04 LTS)

This is just the beginning of a project so we're just starting to wire stuff up now. It's pretty clean. I'm using a basic single connection. If there is any other information that would help, just let me know.

One thing that came to mind is I created the database schema via MySQL workbench. There are only 2 tables but there is one foreign key constraint between the two. One user to many events.

Thanks!

So you're still having this issue even when the Idiorm code is run
completely separately from the rest of your codebase? If so please could
you include your table schema?

Hi,

I got the same Issue here with an SQLite (3.7.7) DB on Win7/64 and PHP 5.4.16. The DB is not created with Idiorm.The DB structure is:

CREATE TABLE "nav_config" ("id" varchar(100) PRIMARY KEY  NOT NULL , "value" varchar(255))CREATE TABLE "nav_config" ("id" varchar(100) PRIMARY KEY  NOT NULL , "value" varchar(255));
CREATE TABLE "nav_structure" ("Id" INTEGER PRIMARY KEY NOT NULL, "ParentId" INTEGER, "ObjectType" INTEGER, "FileName" VARCHAR(255), "DisplayName" VARCHAR(255), "DisplayNameTranslated" VARCHAR(255), "PathName" VARCHAR(255), "Language" VARCHAR(5))

My Code:

echo ORM::for_table('nav_structure')->count();
$sets = ORM::for_table('nav_structure')->find_many();
foreach ($sets as $set){
    echo $set->DisplayName;
}
//returns 22Item22

You can downlad the file here: http://www.sederis.de/test.zip

I've got the same issue, PHP 5.5.3, mysql 5.5.33, imported database. reverting to version 1.3.0 resolves the issue.

So I have tried and I cannot replicate this issue. Please could someone who is experiencing this help by debugging and providing a pull request.

I have successfully replicated this issue, but I have not had time to analyse the circumstances under which it occurs.

I intend to soon.

Here are my initial findings.

I get this issue when the table I am using does not have an id column, the id column name has been configured incorrectly in idiorm, or the data in the id column is not suitable (not unique, for example).

The reason I am only seeing one result returned, is that line 628 attempts to assign each row of the results to an array with an index of $row->id - in my case, this is always null thus resulting in an array that has only 1 value.

As idiorm demands an id column on every table (configurable regarding name) then I would say that the failure I am seeing is expected.

It would be useful if you guys could add the following code to your idiorm.php file:

Before the foreach loop on line 626:

echo '$rows = ' . count($rows) . '<br>';

After the foreach loop (before return $instances):

echo '$instances = ' . count($instances);

If you have the id problem then you will see a bigger number for $rows than $instances. If things are working correctly (inside this function) then you should get the same result from both.

Let me know your results.

I propose we handle this by comparing count($rows) to count($instances) before we return from the function. If those numbers differ we should throw an exception pointing out that the nominated id column must contain a unique id for the row.

@michaelward82 Thank you for your investigation. I would say that the cause of your issue is definitely the addition of #133 in 1.4.0. Due to that change if you do not have unique IDs then they will all be assigned to the same array element.

Good find, it occurs for me because the id on my table is in caps: "ID". Maybe add some fallbacks if "id" is not found?

@peterpeerdeman I think you should be able to get around this by specifying the correct id column name. See http://idiorm.readthedocs.org/en/latest/configuration.html#id-column (note there is a global and separate per-table setting)

We have a some options here, I think we need to compare the in and out numbers and throw an exception if they differ.

As I see it we could:

  1. raise a generic exception,
  2. raise a new custom exception which will exist in addition to the one added for __call protection in pull request #152
  3. change pull request #152 to use a new idiorm exception type that will be shared

Option 1 has the drawback that it cannot be tested properly because of the version of phpunit used by Travis for php 5.2 testing.

Option 2 introduces another new custom exception type, but it carries no additional functionality over the exception type added in #152. Feels a bit unDRY to me.

Option 3 requires a little additional work, but seems like the most sensible option.

Let me know what you think and I'll code up the protection for this situation.

@peterpeerdeman I also forgot about the method use_id_column() - see PHPDoc at:

public function use_id_column($id_column) {

@michaelward82 I am not sure how much I like counting the records when they are returned and throwing an exception.

I think it would be better to check if there is an id column in the results using $this->_get_id_column_name() and if it doesn't then don't pass the results to _instances_with_id_as_key($rows) (

protected function _instances_with_id_as_key($rows) {
) and return array_map(array($this, '_create_instance_from_row'), $rows); instead.

So it would look something like:

protected function _find_many() {
    $rows = $this->_run();
    if(isset($rows[0][$this->_get_id_column_name()])) {
        return $this->_instances_with_id_as_key($rows);
    }
    return array_map(array($this, '_create_instance_from_row'), $rows);
}

Anyway that is off of the top of my head and I haven't tested it. Thoughts?

I suggested a comparison between count($rows) and count($instances) because it would throw an exception both in situations where no id column exists, but also where the id column exists, but contains duplicate values.

The second scenario is a subtle bug, because the query will seem to have executed correctly, but the result may only be a subset of what you should have received.

I like the idea of protecting against the lack of an id column by selectively avoiding returning an associative array, but I would also like to introduce the count() check to prevent the duplicate id's bug.

The ID column must be unique/a primary key (see: http://idiorm.readthedocs.org/en/latest/configuration.html#id-column). All of Idiorm's model management relies on that being the case (delete and update etc) so I don't see that as being a big issue to be honest. ID columns simply must be unique - if they are not then they are not an ID column and should not be specified as such.

A unique primary key is required by idiorm; I don't believe we cater to the situation, and we should not be returning an an alternative associative array.

My preference would be to see:

  • an exception thrown if the id column is missing - this being an egregious mistake with regard to idiorm, and
  • a warning logged if we detect a discrepancy in results that suggests the id column does not contain unique values.

We have a reliable method to detect a lack of an id column. We also know that the if $instances does not contain as many elements as $rows in the _instances_with_id_as_key() method then we have a duplicate id / array key issue.

I don't like the idea of leaving developers in a potentially difficult to debug situation when we could provide them with a simple and straightforward message, highlighting a mistake they can often remedy easily & quickly.

I agree in principle, but I am also aware that we have broken backwards compatibility for existing projects so I would rather provide a drop in fix for those situations hence my halfway house solution. The more explicit the better though like you say.

If I interpret you correctly, you would like to maintain backward compatibility for those who have used idiorm to perform technically unsupported finds on tables with no id column. Correct?

In such a case, I suggest that we issue log notices in both situations, while passing through the non associative array when no id column is detected.

This maintains backwards compatibility whilst alerting developers to the fact they are using idiorm in an unsupported and unintended manner, thus allowing them to ignore or address the situation as they see fit.

Yes, that could work out nicely given the annoying situation.

Thinking about error messages.

When duplicate id values are detected:

trigger_error("Idiorm - Unique values expected in table " .
    "'$this->_table_name' column '" . $this->_get_id_column_name() .
    "' but duplicate values were detected");

Resulting in something like:

Warning: Idiorm - Unique values expected in table 'users' column 'id' but duplicate
values were detected in /web/other/idiorm-test/vendor/j4mie/idiorm/idiorm.php on line 639

When no id column is found:

trigger_error("Idiorm - Table '$this->_table_name' does not have a valid '" .
        $this->_get_id_column_name() . "' column");

Resulting in something similar to:

Warning: Idiorm - Table 'users' does not have a valid 'id' column in /web/other/idiorm-test/vendor/j4mie/idiorm/idiorm.php on line 620

Let me know if you would like different wording. A patch should be on it's way this week some time.

I have to say that I am still not really keen on counting the items in the array each time, but there again it is recommended that resultsets are used instead. I was thinking about this problem in the intervening time and I had thought a default to true configuration option might be a better choice. People could just turn it off if they needed legacy support.

The whole thing could be clearly documented and mentioned in the changelog. It would prevent people who are using Idiorm correctly from being penalised by the required checks.

Do you feel the count() is too computationally costly, or is it that you feel there's an architectural issue?

It feels like a rudimentary check IMO, ensuring that we have the same number of elements in the resulting array as we fed in at the beginning. It only involves two count() operations, performed once for each find_many().

Given that the existing function iterates over an array, assigning the results to another array, I would suggest that the count() operations are of insignificant computational cost - unless I'm wrong and php count()'s are wasteful, inefficient beasts.

What configuration option are you thinking of ad what would it affect?

You're right I am probably just being too sensitive and premature with count(). It is actually quite a fast operation as internally it doesn't actually "count" the objects each time. It just reads a length property of the array structure that PHP keeps up to date when elements are removed or added to an array. Of course given the double loop you mention this is, but a very small consideration.

I look forward to your PR.

Thanks!

Oh and as for the configuration option I was thinking something like https://github.com/j4mie/idiorm/compare/regression.find_many_id_keys

I didn't bother to fix the tests by the way.

Surt commented

Hi guys,
I must apologize about the primary key as index. It does cause problems in has_many_through. If the values returned are duplicated they appear just 1 time.

Example as @tomvo has pointed out:
grocery list, a grocery list can have multiple products and a product can be on multiple grocery lists. However in this case a grocery list can have multiple products twice.
The problem is, since we have multiple apparitions of a db row, and they are assigned to the same associative key, it appears just 1 time.

I don't really know how to fix it on a elegant way. Actually I'm triying with a variable on the find_many interface, wich defines if fetch by associative array or not.
What do you guys think?

@Surt I thought about adding a parameter to find_many(), but it makes the API unclear when you're just calling find_many(true) etc. I mean what does the true relate to? You'd have to dig through the source of Idiorm to figure it out. I would rather have a universal compatibility switch as described in https://github.com/j4mie/idiorm/compare/regression.find_many_id_keys

It is probably worth pointing out there is another PR that will potentially change the handling of the primary key column: #161

Obviously these two pieces of code could be a merge conflict. Not a problem, but just wanted to point it out.

Surt commented

@treffynnon yes, I see your point there. But if we want to use the associative keys, the has_many_through will be broken. Maybe on the relationship method, disable it temporary, and enable it after the fetch if it was enabled before?

yah, I saw the #161 , it causes me real terror :) but it is a real achievement if it works ok.

Surt commented

By the way, the actual test in Paris/Idiorm checks the queries sended, but no the results, maybe there is a need for a sqlite test with models and results?

Surt commented

As for the no primary id setted on the database, there was a solution on the original commit
$key = (isset($row->{$this->_instance_id_column})) ? $row->id() : $i;
so, if no id() is defined on the database, it just fetch it as non associative.

Yes, I would say that you're correct here. The tests have traditiomally
been very simple and focussed on query builder output rather than the
resultant models. It would be beneficial to have some of the latter as you
say.

Unfortunately for me it comes down to a simple issue of time to write the
tests. It is also very rare to get a PR with just a whole load of tests in
it to further the libraries testability. It's just not as rewarding as
having your code in the core library at the end of the day.
On Oct 7, 2013 5:40 PM, "Erik Wiesenthal" notifications@github.com wrote:

By the way, the actual test in Paris/Idiorm checks the queries sended, but
no the results, maybe there is a need for a sqlite test with models and
results?


Reply to this email directly or view it on GitHubhttps://github.com//issues/156#issuecomment-25824344
.

I must have missed that line in there!
On Oct 7, 2013 5:45 PM, "Erik Wiesenthal" notifications@github.com wrote:

As for the no primary id setted on the database, there was a solution on
the original commit
$key = (isset($row->{$this->_instance_id_column})) ? $row->id() : $i;
so, if no id() is defined on the database, it just fetch it as non
associative.


Reply to this email directly or view it on GitHubhttps://github.com//issues/156#issuecomment-25824687
.

Argh! I didn't fully understand what you meant here in your first message.
That is a pain. Your switching idea before and after is probably the
easiest way around this particular problem without changing the external
API of find_many().
On Oct 7, 2013 5:38 PM, "Erik Wiesenthal" notifications@github.com wrote:

@treffynnon https://github.com/treffynnon yes, I see your point there.
But if we want to use the associative keys, the has_many_through will be
broken. Maybe on the relationship method, disable it temporary, and enable
it after the fetch if it was enabled before?

yah, I saw the #161 #161 , it
causes me real terror :) but it is a real achievement if it works ok.


Reply to this email directly or view it on GitHubhttps://github.com//issues/156#issuecomment-25824158
.

For my two cents:
I think in the end using the foreign key as the associative key will not work for records that appear multiple times in a linked table. I think even composite/compound keys will not be be the solution to this either.
Switching around seems the best and easiest way at this point as well.

On how to deal with this on the eager loading fork that @Surt and I are working is still open though but also not relevant for this thread.

Surt commented

about the unique id field error, I suppouse that if there is no id there is no unique primary, right?: Apart from the @treffynnon fix I would like to propose:

        /**
         * Create instances of each row in the result and map
         * them to an associative array with the primary IDs as
         * the array keys.
         * @param array $rows
         * @return array
         */
        protected function _instances_with_id_as_key($rows) {
            $instances = array();
            foreach($rows as $row) {
                $row = $this->_create_instance_from_row($row);
                $key = (isset($row->{$this->_instance_id_column})) ? $row->id() : $i;
                $instances[$key] = $row;
            }
            return $instances;
        }

there is a check if the instance id column is set, if not, the array will be filled as non associative
$key = (isset($row->{$this->_instance_id_column})) ? $row->id() : $i;

as for Paris, the same line could be dropped on the actual overloaded function.

Don't forget to initialize $i and autoincrement it ;)

Surt commented

yep, it will be this way:

        /**
         * Create instances of each row in the result and map
         * them to an associative array with the primary IDs as
         * the array keys.
         * @param array $rows
         * @return array
         */
        protected function _instances_with_id_as_key($rows) {
            $size = count($rows);
            $instances = array();
            for ($i = 0; $i < $size; $i++) {
                $row = $this->_create_instance_from_row($rows[$i]);
                $key = (isset($row->{$this->_instance_id_column})) ? $row->{$this->_instance_id_column} : $i;
                $instances[$key] = $row;
            }

            return $instances;
    }

and for Paris

        /**
         * Create instances, then models, of each row in the result and map
         * them to an associative array with the primary IDs as
         * the array keys.
         * @param array $rows
         * @return array
         */
        protected function _instances_with_id_as_key($rows) {
            $size = count($rows);
            $instances = array();
            for ($i = 0; $i < $size; $i++) {
                $row = $this->_create_instance_from_row($rows[$i]);
                $row = $this->_create_model_instance($row);
                $key = (isset($row->{$this->_instance_id_column})) ? $row->{$this->_instance_id_column} : $i;
                $instances[$key] = $row;
            }

            return $instances;
    }

the for loop is not so weird to see and it ads some micro-optimization to speed.
It could be possible to add on the $key line, a check for the new config value for backward compatibility.

OK, just hit this bug while querying a relationship table whose primary key is the two foreign keys. If I call find_array() I get the correct number of results, however find_many() returns last row only. I went back to the table and added an id column and the problem still persists.

Running Apache 2.2.25, PHP 5.5.4, MySQL 5.5.33 installed via MacPorts on Mac OS X Mavericks.

Will try 1.3 as suggested while this gets fixed up.

UPDATE: All good in 1.3, even without the added id in my relationship table.

Surt commented

@restlesspw could you try with my commit? Notice that i did not add the one for París but you can see the instructions above

@Surt Your commit solves my problem, thanks! Getting multiple rows back as expected.

OK, I've not had the time to push a fix for this. I should have time to take a look next weekend, but if anyone else wants to jump in before then, you should go for it.

P.S. Is PR#162 a direct fix for this, or a related fix?

Surt commented

@michaelward82 is a direct fix. It fixes the findMany() returns only the last record in a set due to no unique primary key defined, and an actual bug with has_many_through returning only 1 instance of an object, where could be multiple.

Actually, Paris is broken too, until this other patch is done:

We must change from https://github.com/j4mie/paris/blob/master/paris.php#L128

   /**
         * Create instances of each row in the result and map
         * them to an associative array with the primary IDs as
         * the array keys.
         * @param array $rows
         * @return array
         */
        protected function _instances_with_id_as_key($rows) {
            $size = count($rows);
            $instances = array();
             for ($i = 0; $i < $size; $i++) {
                $row = $this->_create_instance_from_row($rows[$i]);
                $row = $this->_create_model_instance(row);
                $key = (isset($row->{$this->_instance_id_column}) && $this->_associative_results) ? $row->{$this->_instance_id_column} : $i;
                $instances[$key] = $row;
            }
            return $instances;
        }

And the last change, in Model, https://github.com/j4mie/paris/blob/master/paris.php#L388

       ->where("{$join_table_name}.{$key_to_base_table}", $this->$base_table_id_column)
       ->non_associative();

Until this patchs, Paris is broken on has_many_through, returning just 1 instance of multiple objects with the same id (the objects are pushed on the same key)

Is everybody happy with the direct fix from @Surt I have not had a chance to review it or test it out yet. Please could you let me know your thoughts as I would really like to kill off this regression.

This fix is going to have to be applied as it is causing too many people too many issues. I really need help testing it though. Please could any interested parties report back on its effectiveness and any caveats etc.

As there does not appear to be a clear and completely functioning fix for this patch I am going to back this functionality out of Idiorm and release a patch release v1.4.1 to this end.

I would welcome a similar pull request in the future, but it would need to take into account all the issues raised here relating to this patch. For now though I am removing this problematic and short lived feature.

Surt commented

@treffynnon why not clear and completely? The fix I submit corrects the bug perfectly. Simply checks if an unique id is set and if no, it sets a numerical key index.
The original associative add-on was perfect on my pull request and was changed by idiorm when commited, for no reason as long as I understand. Because of this, the bug appears. I sended The #162 to fix the problems caused due to the modification of my original pull for associative keys, and actually, it is working.

Anyway, the functionality does not add nothing indispensable to the lib. It does instead on Granada, the fork of Idiorm/Paris with eager and lazy loading https://github.com/Surt/Granada

@Surt Unfortunately I have not been able to test all the patches here and it was not clear from the user feedback that the patch fixed all the related issues. I could not leave this open any longer as it is causing problems and the longer it is left the more people will rely on the functionality that I stripped out in 1.4.1.

Sorry.

Additionally I like Granada as a fork and I think this functionality fits nicely in there as an augmentation of Idiorm rather than in the core library itself where I feel we should keep this simplified. If there is willingness to have this included in the base of Idiorm then I would allow a clean patch in the future.

Surt commented

You are right, there is no need for the associative array on idiorm-paris. So, for the sake of "keep it simple" in idiorm/paris you are doing well.

Why just not adding a new findManyAssociative() method? This way findMany() will keep the old behavior unchanged and we could take advantage of associative arrays when needed. In fact I used to call a custom function toAssociative() on many result sets to convert the numbered array into an associative one.

In fact, the method wouldn't take too many LOC. Do you want to prepare a PR?

I intend to teach Slim+Twig+Idiorm+Paris to my students on January, and I found the associative array resultset useful.

Surt commented

I would like to insist that the bug patch is working all right. Along this, there was a config variable to reset to the old behaviour, introduced by @treffynnon and used by me. Anyway... the patch is there, with the config variable too...

I take you at your word @Surt and I definitely did not mean to offend, but
unfortunately I didn't have time to test it myself. In addition I was not
entirely happy with the work around we had put together. It seemed almost
too fragile to make it into production in the end.

I understand that this must be frustrating and disappointing for you, but I
had to make a decision on the issue. I am sorry that it didn't work out and
if I have caused any offence with my decision.

Surt commented

Oh! sorry @treffynnon , no offense. It's my bad english fault.
Keep the good work!