Sporadic but Frequent Firefox Errors on Linux
lread opened this issue · 2 comments
Version
master, f8c4944 as of this writing.
Platform
Operating System: PopOS Linux
Clojure version: 1.11.1.1273
Babashka version: 1.3.179
JDK vendor and version: 17.0.7 temurin
Browser vendor: firefox
Browser version: 112.0.1
WebDriver version: geckodriver 0.33.0
Symptom
When running tests under Firefox I'll sometimes get a java.net.SocketTimeoutException: Read timed out
. The exception shows an attempt to disconnect is occurring:
...
at slingshot.support$stack_trace.invoke(support.clj:201)
at etaoin.impl.client$call.invokeStatic(client.cljc:131)
at etaoin.impl.client$call.invoke(client.cljc:95)
at etaoin.api$execute.invokeStatic(api.clj:235)
at etaoin.api$execute.invoke(api.clj:209)
at etaoin.api$delete_session.invokeStatic(api.clj:275)
at etaoin.api$delete_session.invoke(api.clj:269)
at etaoin.api$disconnect_driver$fn__8617.invoke(api.clj:3631)
at etaoin.api$disconnect_driver.invokeStatic(api.clj:3631)
at etaoin.api$disconnect_driver.invoke(api.clj:3625)
at etaoin.api$quit.invokeStatic(api.clj:3650)
at etaoin.api$quit.invoke(api.clj:3645)
at etaoin.api_test$fixture_browsers$fn__8724.invoke(api_test.clj:57)
at etaoin.api_test$fixture_browsers.invokeStatic(api_test.clj:57)
at etaoin.api_test$fixture_browsers.invoke(api_test.clj:53)
...
Reproduction
Happens sporadically but frequently.
`bb test-doc`
Or
`bb test:jvm --browsers firefox --suite api
Actual behavior
Read timeout exception is thrown.
Expected behavior
Tests should pass.
Diagnosis
I don't think this is an Etaoin problem.
It seems to me like a Firefox and/or GeckoDriver problem.
If I turn on detailed logging for firefox via options:
{:driver-log-level "trace" :log-stdout :inherit :log-stderr :inherit}
When things go awry I see geckodriver logs like this:
WARNING: A blocker encountered an error while we were waiting.
Blocker: TelemetryController: shutting down
Phase: profile-before-change-telemetry
State: Error getting state: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange" at addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:727:15
addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:523:26
addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:458:15
setupShutdown@resource://gre/modules/osfile/osfile_async_front.jsm:1548:28
@resource://gre/modules/osfile/osfile_async_front.jsm:1568:16
@resource://gre/modules/osfile.jsm:12:30
@resource://gre/modules/TelemetryStorage.sys.mjs:10:28
_checkPendingPings@resource://gre/modules/TelemetrySend.sys.mjs:863:17
setup@resource://gre/modules/TelemetrySend.sys.mjs:803:18
setup@resource://gre/modules/TelemetrySend.sys.mjs:241:30
setupTelemetry/this._delayedInitTask<@resource://gre/modules/TelemetryControllerParent.sys.mjs:824:36
observe@resource://gre/modules/AsyncShutdown.sys.mjs:576:16
WARNING: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
WARNING: addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:727:15
addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:523:26
addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:458:15
setupShutdown@resource://gre/modules/osfile/osfile_async_front.jsm:1548:28
@resource://gre/modules/osfile/osfile_async_front.jsm:1568:16
@resource://gre/modules/osfile.jsm:12:30
@resource://gre/modules/TelemetryStorage.sys.mjs:10:28
_checkPendingPings@resource://gre/modules/TelemetrySend.sys.mjs:863:17
setup@resource://gre/modules/TelemetrySend.sys.mjs:803:18
setup@resource://gre/modules/TelemetrySend.sys.mjs:241:30
setupTelemetry/this._delayedInitTask<@resource://gre/modules/TelemetryControllerParent.sys.mjs:824:36
observe@resource://gre/modules/AsyncShutdown.sys.mjs:576:16
JavaScript error: resource://gre/modules/AsyncShutdown.sys.mjs, line 727: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
JavaScript error: resource://gre/modules/AsyncShutdown.sys.mjs, line 727: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
JavaScript error: resource://gre/modules/AsyncShutdown.sys.mjs, line 727: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
JavaScript error: resource://gre/modules/AsyncShutdown.sys.mjs, line 727: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
JavaScript error: resource://gre/modules/AsyncShutdown.sys.mjs, line 727: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
JavaScript error: resource://gre/modules/AsyncShutdown.sys.mjs, line 727: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
console.error: (new Error("SessionFile is closed", "resource:///modules/sessionstore/SessionFile.sys.mjs", 370))
console.error: (new AbortError("IOUtils: Shutting down and refusing additional I/O tasks", (void 0), 890))
JavaScript error: resource://gre/modules/CrashManager.jsm, line 927: AbortError: IOUtils: Shutting down and refusing additional I/O tasks
Best guess: It feels like a race condition, something that is starting up is maybe still starting up while we are shutting down, and then things go kaboom.
I do see a bug reported around this for firefox, but it is reported as fixed. But maybe not for Firefox v112.0.1?
If I'm right about the race condition, this scenario probably won't happen much in typical use.
Etaoin tests sometimes do very little and are therefore generate very short lived session.
If a session were allowed to live longer than these internal firefox services take to startup, my guess is that we'd be fine.
This might also be an issue on macOS and Windows.
My dev box is PopOS Linux.
This might not occur as much on CI because CI is pretty slow and this slowness might give things the time they need to avoid this condition.
Action
Not sure yet.
Thought I'd raise the issue to maybe see if others are suffering it too.
Maybe wait until the next release of Firefox before taking action.
If we need to compensate we could try to ensure that internal services are started before we shutdown.
I am also getting infrequent Read timed out
errors, also without pretty much any reproducibility. It's occuring in the context of a call to pmap
. I'm trying to perform some actions on each row of a table. To make this faster, I'm trying to:
(pmap #(e/with-firefox-headless driver (doStuff driver %)) rows)
This has been a real headache. I can run my script 3 times and 2/3 tries run wtihout issue. Unsure how to proceed.