neo4j/neo4j-java-driver

Message PULL .. cannot be handled by a session in the READY state

ikwattro opened this issue · 7 comments

We have the following piece of driver usage :

@Test
    public void test_transaction_timeout() {
        try(Driver driver = GraphDatabase.driver("neo4j://neo4j-replica-001.xx.xxx", AuthTokens.basic("user", "pass"))) {
            try (Session session = driver.session()) {
                TransactionConfig config = TransactionConfig.builder()
                        .withTimeout(Duration.of(15000, ChronoUnit.MILLIS))
                        .build();
                try (Transaction tx = session.beginTransaction(config)) {
                    tx.run("CALL apoc.util.sleep(30000)");
                    tx.commit();
                }
            }
        }
    }

One time on two, we receive different error messages

One message which makes sense :

org.neo4j.driver.exceptions.ClientException: The transaction has been terminated. Retry your operation in a new transaction, and you should see a successful result. The transaction has not completed within the specified timeout (dbms.transaction.timeout). You may want to retry with a longer timeout. 

The second is a bit more cryptic

org.neo4j.driver.exceptions.ClientException: Message 'PULL Map{qid -> Long(0), n -> Long(1000)}' cannot be handled by a session in the READY state.

The query is executed against a replica or a core ( result is same ) of a Neo4j 4.4.14 causal cluster ( 3 cores + 1 replica with SSR enabled ).

The following neo4j-java-driver versions produce the same result :

  • 4.4.9
  • 5.2.0

Is it something expected or is our piece of logic wrong ?

I can confirm the issue. The error returned, after the timeout, seems to be random. I created unit test, with several different ways how the query can be run, with implicit and explicit transaction. Very often the query doesn't fail with expected Neo.ClientError.Transaction.TransactionTimedOut, but with different error e.g. Neo.ClientError.Request.Invalid with message e.g. Message 'PULL Map{qid -> Long(1), n -> Long(700)}' cannot be handled by a session in the READY state.

@Slf4j
public class TimeoutInTxTest {

    private static final String URL = "neo4j://........";
    private static final String DATABASE = ".......";
    private static final AuthToken AUTH = AuthTokens.basic(System.getenv("NEO4J_USER"), System.getenv("NEO4J_PASSWORD"));
    private static final int TIMEOUT = 3_000;
    private static final String TIMEOUT_ERROR_CODE = "Neo.ClientError.Transaction.TransactionTimedOut";


    // Test fails almost every time with error code: Neo.ClientError.Request.Invalid
    @RepeatedTest(value = 7)
    public void timeout_for_explicit_tx_1() {
        try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
            try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
                TransactionConfig config = TransactionConfig.builder()
                        .withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
                        .build();
                //session.run("create (p:Test) return p");
                try (Transaction tx = session.beginTransaction(config)) {
                    tx.run(format("CALL apoc.util.sleep(%d)", 2 * TIMEOUT));
                    tx.commit();
                }
                Assertions.fail("Should fail with timeout");
            }
        } catch (ClientException ex) {
            log.error("{}, {}", ex.code(), ex.getMessage());
            assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
        }
    }

    // This test fails every 2nd time (with Neo.ClientError.Request.Invalid)
    @RepeatedTest(value = 7)
    public void timeout_for_explicit_tx_2() {
        try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
            try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
                TransactionConfig config = TransactionConfig.builder()
                        .withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
                        .build();
                session.run("create (p:Test) return p");
                try (Transaction tx = session.beginTransaction(config)) {
                    tx.run(format("CALL apoc.util.sleep(%d)", 2 * TIMEOUT));
                    tx.commit();
                }
                Assertions.fail("Should fail with timeout");
            }
        } catch (ClientException ex) {
            log.error("{}, {}", ex.code(), ex.getMessage());
            assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
        }
    }

    // This test fails every 2nd time (with Neo.ClientError.Request.Invalid)
    @RepeatedTest(value = 7)
    public void timeout_for_explicit_tx_3() {
        try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
            try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
                TransactionConfig config = TransactionConfig.builder()
                        .withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
                        .build();
                session.run("create (p:Test) return p");
                try (Transaction tx = session.beginTransaction(config)) {
                    tx.run("UNWIND range(0,1000000) AS i RETURN i").stream().forEach(r -> {});
                    tx.commit();
                }
                Assertions.fail("Should fail with timeout");
            }
        } catch (ClientException ex) {
            log.error("{}, {}", ex.code(), ex.getMessage());
            assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
        }
    }

    // This test fails every 2nd time (with Neo.ClientError.Request.Invalid)
    @RepeatedTest(value = 7)
    public void timeout_for_explicit_tx_4() {
        try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
            try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
                TransactionConfig config = TransactionConfig.builder()
                        .withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
                        .build();
                try (Transaction tx = session.beginTransaction(config)) {
                    tx.run("create (p:Test) return p");
                    tx.run("UNWIND range(0,1000000) AS i RETURN i").stream().forEach(r -> {});
                    tx.commit();
                }
                Assertions.fail("Should fail with timeout");
            }
        } catch (ClientException ex) {
            log.error("{}, {}", ex.code(), ex.getMessage());
            assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
        }
    }

    // This test always fail with Neo.ClientError.Request.Invalid
    @RepeatedTest(value = 7)
    public void timeout_for_implicit_tx_1() {
        try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
            try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
                TransactionConfig config = TransactionConfig.builder()
                        .withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
                        .build();
//                session.run("create (p:Test) return p");
                session.executeWriteWithoutResult(ctx -> {
                    ctx.run(format("CALL apoc.util.sleep(%d)", 2 * TIMEOUT));
                }, config);
                Assertions.fail("Should fail with timeout");
            }
        } catch (ClientException ex) {
            log.error("{}, {}", ex.code(), ex.getMessage());
            assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
        }
    }

    // This test fails every 2nd time (with Neo.ClientError.Request.Invalid)
    @RepeatedTest(value = 7)
    public void timeout_for_implicit_tx_2() {
        try (Driver driver = GraphDatabase.driver(URL, AUTH)) {
            try (Session session = driver.session(SessionConfig.builder().withDatabase(DATABASE).build())) {
                TransactionConfig config = TransactionConfig.builder()
                        .withTimeout(Duration.of(TIMEOUT, ChronoUnit.MILLIS))
                        .build();
                session.run("create (p:Test) return p");
                session.executeWriteWithoutResult(ctx -> {
                    ctx.run(format("CALL apoc.util.sleep(%d)", 2 * TIMEOUT));
                }, config);
                Assertions.fail("Should fail with timeout");
            }
        } catch (ClientException ex) {
            log.error("{}, {}", ex.code(), ex.getMessage());
            assertThat(ex.code()).isEqualTo(TIMEOUT_ERROR_CODE);
        }
    }

}

Do you think it may be an issue with bolt protocol?

@ikwattro, thanks for reporting the issue. Unfortunately, I am struggling to reproduce the following error:

org.neo4j.driver.exceptions.ClientException: Message 'PULL Map{qid -> Long(0), n -> Long(1000)}' cannot be handled by a session in the READY state.

So far, I have setup a 3 node cluster using the neo4j:4.4.14-enterprise Docker image with SSR enabled. I tried hitting both the writer and the readers. It feels like I might be missing some crucial detail to reproduce.

@grzegorz-aniol, thanks for the tests. 👍

For some reason, I am getting different results:

  • timeout_for_explicit_tx_1 - all passed
  • timeout_for_explicit_tx_2 - all passed
  • timeout_for_explicit_tx_3 - all passed
  • timeout_for_explicit_tx_4 - all passed
  • timeout_for_implicit_tx_1 - all passed
  • timeout_for_implicit_tx_2 - all passed

It is probably worth noting I had console logging enabled as I was hoping to catch some network exchanges.

It would be good to establish how the Message 'PULL Map{qid -> Long(0), n -> Long(1000)}' cannot be handled error happens. If it is easy enough to reproduce, could you enable the driver debug logging (example: Config.builder().withLogging(Logging.console(Level.FINE)).build()) and share the log please?

I've been running these days exactly same unit tests, few times, and I cannot reproduce this issue anymore. What is strange, the cluster version is the same. I also tried with different driver versions (as described by @ikwattro), with same result.

Lets close this one. Please open a new issue or re-open this one if it becomes reproducible again.