bahmutov/cypress-network-idle

Intermittent failure due to hanging requests; resolved by using Chrome Developer Protocol rather Cypress.intercept

Opened this issue · 3 comments

Hi,

I was experiencing intermittent failure where the cypress-network-idle said there was a hanging request but looking at the HAR file and server logs, I could see that the request had been successfully processed and returned.

Unfortunately I can't provide replication steps but I decided to investigate re-implementing the same functionality by subscribing to the Chrome Developer Protocol and I no longer have that intermittent issue, and the waiting consistently works with no hanging requests.

I appreciate that this is bypassing the Cypress Intercept functionality but wondered if this different approach could be of interest?

It would be, but then it would be pretty big difference, no?

Yes, its very similar implementation to https://github.com/NeuraLegion/cypress-har-generator but using the events to update a set of request ids.

It would required explicit installation of the plugin in the cypress.config.js in the same way as the har generator but ultimately the experience in the tests would be very similar.

My use case is restricted to the "separate prepare" approach of defining the matching requests at the start.

However, I think a more complex scenario in which urls/methiods are dynamically defined could be achieved

I'm guessing this issue might be related to these:

I am using cypress and mocks-server.
I always use the latest version of both.
The tested app is written with vue and uses axios for requests.

Back in October 2022, I had about 200 e2e tests.
It took 15 minutes to run them all in parallel.

Suddenly I ran into a problem.

At random times, a random test will fail due to a failed network response assertion.
Typically, this has been one of the simple CRUD tests that checks that:

  • after submitting the filled form, the application makes a successful POST request and a new record appears in the table.
  • after editing a record in the table, the application makes a successful PUT request and the record in the table is marked as edited.
  • after deleting a record from the table, the application makes a successful DELETE request and the record in the table is marked as deleted.

Thus, the scheme of each failed test was something like this:

// ...
cy.intercept('DELETE', '**/user/2').as('deleteUserRequest');

cy.get('#btn-user-delete').click();

cy.wait('@deleteUserRequest')
  .should('have.nested.property', 'response.statusCode', 200);

// ...

But sometimes a random test like this will fail at random times, because instead of getting a 200 status, the interceptor will get an ECONNRESET socket hang up response.
But the interesting part is that when I watch the video of the failed test, I can clearly see that the application worked as expected, and despite the ECONNRESET socket hang up error, it marked the expected item as deleted.
Later, to find out reason, I installed cypress-har-generator plugin.
And when I looked at the wigh google devtools network with imported recorder har, I saw that request was completely fine and had a status of 200.

So for an application this ECONNRESET socket hang up was fine.

As the number of tests increased, the error began to appear more often.
When the number of tests reached 600 in February 2023, they started to randomly fail in about 5% of runs with the same error, which is unacceptable.

I spend a lot of time re-running tests at night while I sleep and after 200+ runs I was able to figure out the cause.

Basically when you create an interceptor your application works like this:

Your application sends all of its requests through the Cypress proxy.
application <-> cypress host proxy <-> application server

So when you call cy.intercept('DELETE', '**/user/2') it informs the cypress proxy that the request matching this should be intercepted and passed to test code.
The Cypress proxy creates a record that it should intercept this request and passes the request to the application server.

Later when the application server responds to the Cypress proxy.
The Cypress proxy sends the intercepted data to your cypress tests (at cy.wait) and to your application.

But if you're unlucky, the connection to your application server might close right there due to a keep-alive timeout.
In response, the server will send an ECONNRESET socket hang error message.

Cypress will pass it to your tests and your application as usual.
Your application will notice that this error is not fatal and will establish a new connection to the server immediately afterwards.
But your tests have already received a failed response and will not retry to check if the connection is restored.

Since the app just creates a new connection, I was able to verify that my app was working as expected when I checked video.
But since only the second attempt yields an answer, the cypress assertion fails as it tests the first answer.

Back in February 2023, I was able to reliably reproduce the bug by setting the keep-alive header and socket timeout so that right at the moment the test tries to intercept the socket, the timeout will expire and send the ECONNRESET socket to hang.

Increasing the timeout for the mocks server from 5 seconds to 60 seconds completely solved my problem.
Increasing the keep alive ensures that while you are intercepting requests, your server doesn't drop that connection.

I've run all my 600+ tests about 200 times since February, both locally and in CI, and have never seen this issue again.

Unfortunately, today I was not able to relyable reproduce the problem again using cypress 10 and 12.
However, this still happens in cypress 10.