laravel/framework

Use WHERE ... IN for simple Wherehas() and Has() constraints

Closed this issue · 9 comments

I already posted this as part of #3543 and #5328, but I probably should have known better than adding it to a closed issue. Thus, I'm opening a new one.

The issue is that the has() and whereHas() constraints in the eloquent query builder creates code that can be very slow and results in nested table scans for seemingly harmless queries. I wrote some code that rectifies the issue for the most common use case and it would be nice to see it or something similar included in Laravel.

What is the problem?

Note: I'm only referring to Mysql not sure how other databases optimise things.

Look at the task of "Find all users that commented on any post belonging to a certain thread". (Assuming Post, Comment and User models with the obvious relationships).

User::whereHas('comments', function($q) use ($threadId) {
  $q->whereHas('post', function($q) use ($threadId) {
    $q->where('thread_id', $threadId);
  });
})->get();

The way laravel crafts the has(...) queries using correlated subqueries often results in nested full table scans. That is, a full table scan of users where every row triggers a full table scan of comments where again every row triggers a full table scan of posts. In short, once your database grows, you are dead.

However, the task itself would be much faster if we first get the set of posts, then get the set of comments and then get the set of users. Doing it this way the execution time only depends on the number of records actually involved but not the overall size of the database.

Currently there are only two ways to achieve this:

  1. rewrite the whole thing using join statements and a final distinct.
  2. do it manually in PHP by first pulling out the IDs of the sub-queries manually and passing them on to another query using the whereIn(..., array) method.

As far as Mysql 5.5 is concerned that is actually where the story ends.
Mysql 5.5 turns a where ... in (select ...) statement into a where exists ... correlated subquery which again results in nested table scans.

However, Mysql starting from 5.6.6 has the "Semi Join" and "Materialization" optimisation strategies for where ... in (select ...) type queries. The former tries to resolve the query as a join type query. If that's not possible, the latter evaluates the sub-queries into temporary tables and takes it from there.

Thus, using where ... in (select ...) and Mysql 5.6 is the solution.

The patch

I wrote simple code that allows the has(...) method (which gets ultimately called by whereHas, orHas and all the other flavours) to take advantage of relationships that can craft a where ... in (select ...) constraint if the count value is >= 1.
I also included the constraint generating code for the HasOneOrMany, BelongsTo and BelongsToMany relationshipts. I didn't add the morphing ones because I'm not using them and thus cannot test them easily. But adding them should be straight forward.
The code is opportunistic in that it falls back to the old behaviour whenever the new where in facility isn't applicable.

I wrote and tested the code for Laravel 4.2, but as far as I can see there haven't been changes to this section of code in 5.0, so it should apply to both.

I apologise for not presenting a finished pull request. My own implementation in my project is by deriving from the relationship classes so I can leave the vendor folder unchanged. Thus, I'm simply pasting the relevant methods here.

\Illuminate\Database\Eloquent\Builder::has()

I added the big if block, the rest is unchanged.

    public function has($relation, $operator = '>=', $count = 1, $boolean = 'and', Closure $callback = null)
    {
        if (strpos($relation, '.') !== false)
        {
            return $this->hasNested($relation, $operator, $count, $boolean, $callback);
        }

        $relation = $this->getHasRelationQuery($relation);

        if ( (($operator == '>=') && ($count == 1)) || (($operator == '>') && ($count == 0)) ) {
            if (method_exists($relation, 'getWhereHasOneConstraints')) {
                $whereHasOneConstraints = $relation->getWhereHasOneConstraints($callback, $this);
                if (is_array($whereHasOneConstraints) && !empty($whereHasOneConstraints)) {
                    return $this->whereRaw($whereHasOneConstraints['sql'], $whereHasOneConstraints['bindings'], $boolean);
                }
            }
        }

        $query = $relation->getRelationCountQuery($relation->getRelated()->newQuery(), $this);

        if ($callback) call_user_func($callback, $query);

        return $this->addHasWhere($query, $relation, $operator, $count, $boolean);
    }

Illuminate\Database\Eloquent\Relations\BelongsTo::getWhereHasOneConstraints()

    public function getWhereHasOneConstraints(Closure $callback, $parent) {
        $parentKey = $this->wrap($this->getQualifiedForeignKey());
        $selectKey = $this->wrap($this->query->getModel()->getTable().'.'.$this->otherKey);

        if ($callback) call_user_func($callback, $this->query);
        $this->query->select(new Expression($selectKey));

        return array(
            'sql' => new Expression($parentKey .' in (' . $this->query->toSql() . ')'),
            'bindings' => $this->query->getBindings(),
        );      
    }

Illuminate\Database\Eloquent\Relations\HasOneOrMany::getWhereHasOneConstraints()

    public function getWhereHasOneConstraints(Closure $callback, $parent) {


        $parentKey = $this->wrap($this->getQualifiedParentKeyName());
        $selectKey = $this->wrap($this->getHasCompareKey());

        if ($callback) call_user_func($callback, $this->query);
        $this->query->select(new Expression($selectKey));

        return array(
            'sql' => new Expression($parentKey .' in (' . $this->query->toSql() . ')'),
            'bindings' => $this->query->getBindings(),
        );  
    }   

Illuminate\Database\Eloquent\Relations\BelongsToMany::getWhereHasOneConstraints()

    public function getWhereHasOneConstraints(Closure $callback, $parent) {

        if ($parent->getQuery()->from == $this->getRelated()->newQuery()->getQuery()->from) {
            // Table aliasing isn't implemented here. Return null to tell the caller to fall back
            // to the count query method.
            return null;
        }

        $parentKey = $this->wrap($this->getQualifiedParentKeyName());
        $selectKey = $this->wrap($this->getHasCompareKey());

        if ($callback) call_user_func($callback, $this->query);
        $this->query->select(new Expression($selectKey));

        return array(
                'sql' => new Expression($parentKey .' in (' . $this->query->toSql() . ')'),
                'bindings' => $this->query->getBindings(),
        );
    }   

Wow, thanks, though could you please provide this as an actual PR please. Posing code in here is no use to us I'm afraid.

Ok, here you go. I pasted the code into a PR.

@kaymes, thank you, but you seem to have sent this to the wrong branch. Laravel 4.2 is closed for all non-critical fixes. Sending a PR to 5.0 would be preferred if you have the time. :)

@GrahamCampbell, here you go.

Taylor doesn't seem to like this change. I assume he thinks it's too hacky?

It's a real shame, in large projects you can't rely on whereHas method, you have to make a upfront query to get the IDs you want (with massive join if you have to filter through other tables) PLUS on your returned results request -> you have to put some eager load filters (to match the join filter on your first query)

That's very hacky, and a simple whereHas on a 350K rows table take about 1.5sec to run on Mysql :/

+1

Closing as this was rejected by Taylor.

This decision is very frustrating