scylladb/scylla-rust-driver

Driver does not automatically re-prepare queries (follow up?) when running against cassandra

mexus opened this issue · 9 comments

Hello there!

Sometimes, I get the following error when running too many (thousands per second) simple queries:

session.query("SELECT ... WHERE id = (?);", (id,))

Database returned an error: Tried to execute a prepared statement that is not prepared. Driver should prepare it again, Error message: Prepared query with ID 6b24483270d8dd7c3acb6246eb4a0ac3 not found (either the query was not prepared on this host (maybe the host has been restarted?) or you have prepared too many queries and it has been evicted from the internal cache)

Shouldn't the driver automatically re-prepare the statement in such a scenario?

I'm going to use prepared statements instead of simple queries so I won't be affected, but anyhow I believe a situation like that shouldn't happen.

Thanks!

P.S.
I'm running the scylla driver v0.12.0 against a cassandra 4.1.3 cluster if it matters

I've found a similar issue: #342 -- but it doesn't look the same. Also many years have passed :)

UPD: this also (sometimes) happens with a prepared statement in the following scenario: prepare once (Session::prepare), execute many times (Session::execute)

I successfully reproduced the issue against the 3-node Cassandra 4.1.3 cluster. The issue appeared when I added the 4th node to the cluster. I'll try to investigate the issue further.

Apart from the issue itself, I'd like to note that you should not do queries with values without preparing them yourself.
This will cause the driver to first prepare the query and then execute it - and that's for every call to .query(). This is recommended against in our docs as you will do two network round trips per request instead of one so you'll lose a lot of performance.

Apart from the issue itself, I'd like to note that you should not do queries with values without preparing them yourself. This will cause the driver to first prepare the query and then execute it - and that's for every call to .query(). This is recommended against in our docs as you will do two network round trips per request instead of one so you'll lose a lot of performance.

Thanks for the clarification! I've been only running simple queries for purpose of building a small MVP :) But I promise not to do such things in production 😄

After further investigation, what seems to be happening is:

  1. New node joins the cluster (it does not have a prepared statement id in its cache)
  2. On the first try of statement execution, the driver receives UNPREPARED error frame (https://github.com/scylladb/scylla-rust-driver/blob/v0.12.0/scylla/src/transport/connection.rs#L710-L714)
  3. The driver reprepares the statement (https://github.com/scylladb/scylla-rust-driver/blob/v0.12.0/scylla/src/transport/connection.rs#L717)
  4. Driver retries the statement execution right after repreparation (https://github.com/scylladb/scylla-rust-driver/blob/v0.12.0/scylla/src/transport/connection.rs#L719).
  5. Cassandra responds with UNPREPARED error once again. The error is then returned to the user.

I'm not really sure why Cassandra responds with UNPREPARED error during the retry. There might be some race on the Cassandra's side where the database sends the response to PREPARE request before actually updating its cache.

As a side note: I wasn't able to reproduce the issue against Scylla cluster.

In that case @mexus , could you open an issue against Cassandra? It looks like the problem is not with Rust Driver, so I'm going to close this issue.

There is one issue that sounds similar, but I'm not completely sure it's the same one: https://issues.apache.org/jira/browse/CASSANDRA-17401?jql=text%20~%20%22prepared%22

After further investigation, what seems to be happening is:

  1. New node joins the cluster (it does not have a prepared statement id in its cache)
  2. On the first try of statement execution, the driver receives UNPREPARED error frame (https://github.com/scylladb/scylla-rust-driver/blob/v0.12.0/scylla/src/transport/connection.rs#L710-L714)
  3. The driver reprepares the statement (https://github.com/scylladb/scylla-rust-driver/blob/v0.12.0/scylla/src/transport/connection.rs#L717)
  4. Driver retries the statement execution right after repreparation (https://github.com/scylladb/scylla-rust-driver/blob/v0.12.0/scylla/src/transport/connection.rs#L719).
  5. Cassandra responds with UNPREPARED error once again. The error is then returned to the user.

I'm not really sure why Cassandra responds with UNPREPARED error during the retry. There might be some race on the Cassandra's side where the database sends the response to PREPARE request before actually updating its cache.

As a side note: I wasn't able to reproduce the issue against Scylla cluster.

Thank you for such a thorough investigation, @muzarski !

@Lorak-mmk yeah I guess we can close the issue then. Yet another cassandra problem :( I hope one day my company migrates to Scylla after all.

Thanks everybody for participation!