rage/quizzes

Slow queries when a question answer had been reviewed a lot of times

mipyykko opened this issue · 1 comments

For example, one specific question answer has over 10k peer reviews.

This

const peerReviews = await this.query()
.where("quiz_answer_id", quizAnswerId)
.withGraphJoined("answers")

produces a query

SELECT
  "peer_review"."id" AS "id",
  "peer_review"."quiz_answer_id" AS "quiz_answer_id",
  "peer_review"."user_id" AS "user_id",
  "peer_review"."peer_review_collection_id" AS "peer_review_collection_id",
  "peer_review"."rejected_quiz_answer_ids" AS "rejected_quiz_answer_ids",
  "peer_review"."created_at" AS "created_at",
  "peer_review"."updated_at" AS "updated_at",
  "answers"."peer_review_id" AS "answers:peer_review_id",
  "answers"."peer_review_question_id" AS "answers:peer_review_question_id",
  "answers"."value" AS "answers:value",
  "answers"."text" AS "answers:text",
  "answers"."created_at" AS "answers:created_at",
  "answers"."updated_at" AS "answers:updated_at"
FROM
  "peer_review"
LEFT JOIN
  "peer_review_question_answer" AS "answers"
ON
  "answers"."peer_review_id" = "peer_review"."id"
WHERE
  "quiz_answer_id" = $1

that, when specifying the quiz answer with the most peer reviews, produces +40k rows (4 questions in a peer review collection).

The query analyzed (EXPLAIN ANALYZE ...) produces the following:

Hash Right Join  (cost=32901.28..407824.24 rows=45912 width=292)
  Hash Cond: (answers.peer_review_id = peer_review.id)
  ->  Seq Scan on peer_review_question_answer answers  (cost=0.00..321216.07 rows=14199407 width=164)
  ->  Hash  (cost=32758.45..32758.45 rows=11426 width=128)
        ->  Bitmap Heap Scan on peer_review  (cost=324.98..32758.45 rows=11426 width=128)
              Recheck Cond: (quiz_answer_id = '<<<quiz_answer_id>>>'::uuid)
              ->  Bitmap Index Scan on peer_review_quiz_answer_id_index  (cost=0.00..322.12 rows=11426 width=0)
                    Index Cond: (quiz_answer_id = <<<quiz_answer_id>>>::uuid)
Planning time: 0.437 ms
Execution time: 10810.809 ms

So, it has to go sequentially through the peer_review_question_answer table to find all the answers. The table does have an index, which is actually used in the query when the expected number of joins is low, but on these kinds of occasions the planner picks the sequential scan because the cost is lower - even when using the index would be a lot faster.

This query is called quite often (96k times in the last week) and the average rows it returns is 1726, which I think is quite a lot especially if it is only used in the widget. The query is currently the top query ranked by the total time used.

The function that runs the query is called in only one place, mapping to the API endpoint /widget/answers/:answerId/peer-reviews, which is in turn used by the widget to load received peer reviews for a user. Don't know if it's really useful for a single user to get more than 10 000 peer reviews (or even 1 000) loaded in a widget, at least in one go.

Maybe implement paging or something that would limit the rows returned, or just limit the peer reviews loaded here in any case to some magic number? The query could also be refactored in some way, maybe to flip the join direction or something just to see if it helps.

edit: This is not specific to just this one function; a simple query like select * from peer_review where quiz_answer_id = $1 is also slow when the number is large. Queries related to this are top 4 when ranked by total time used - these four queries take up 71% of all the time recorded so far! So, everywhere the peer reviews are queried by quiz_answer_id should be checked to see if we really need that many rows, or to find some other solution.

A lot of this was fixed by fiddling with the database settings; @nygrenh also did some change that fixed some answers getting humongous amounts of reviews and also limited the amount returned to a single user. Some peer review queries probably remain that will return all, but they are probably supposed to do that and are much faster now anyway.