More intelligently detect if a plugin should be killed, rather than waiting for X seconds
Closed this issue · 14 comments
Sometimes, a Gauge plugin takes a long time to run. Gauge makes an effort to not freeze the user's machine and so attempts to preemptively kill a frozen plugin.
Expected behavior
Gauge checks if a plugin is still doing work, such as by examining the time stamp of log files. If there has been work in the last few seconds, assume that the plugin is still active.
Actual behavior
Gauge forcibly kills a plugin after the number of seconds determined by the plugin_kill_timeout
property (by default: 4 seconds), regardless of if the plugin is actively doing work. The code for this begins on this line in plugin/plugin.go
.
This has especially caused issues with the HTML Report plugin on large projects (see HTML Report #120, #121, #153, as well as this project's #636). Theoretically, it could happen with any plugin, but so far, no users have reported that other plugins have run for longer than the default 4 seconds.
Gauge version
Gauge version: 1.0.0
Commit Hash: 5a99965
Plugins
-------
csharp (0.10.3)
html-report (4.0.5)
java (0.6.8)
json-report (0.2.1)
screenshot (0.0.1)
xml-report (0.2.0)
Thanks for bringing this up, @Thunderforge
Here is a proposal:
- The plugins send a "keep-alive" ping back to gauge every N seconds, (N can be X-1).
- Gauge will reset the timer when it receives a "keep-alive" ping.
- This should result in gauge killing the plugin only if it does not hear anything for over the
plugin_kill_timeout
interval.
Thoughts?
@sriv That sounds like an excellent way to do it. Obviously, it would mean updating all the plugins to send a "keep-alive" ping, but I think that in the long run, this is a better solution.
tech notes
There are (at least?) two possible ways to implement a "keep-alive"
Gauge core orchestrated
- gauge sends a "should I keep you alive" request ?
- plugin responds with a "yes, please!"
- gauge resets the timer
- not done? - goto (1)
- the plugin is killed when it fails to respond to "should I keep you alive".
plugin initiated
- plugins knows it will be killed in
plugin_kill_timeout
time interval. - plugin sends a "wait I am not done yet" request to gauge just before the timeout is about to expire.
- gauge resets the timeout timer.
- not done? - goto (2)
/cc @getgauge/core - please add your thoughts.
If we do the Gauge core initiated
then plugins need to have additional metadata (ex. capabilities
) that should be honoured by Gauge before sending these requests.
I prefer the first approach where Gauge core sends a should I keep you alive
request to the plugins. This seems more in terms of our current communication approach where the communication happens from Gauge to the plugins and plugins only respond to requests by Gauge.
But this also means that the plugins need to keep listening to the requests sent by Gauge. A single threaded approach may not work here as plugin may not be able to respond to Gauge immediately.
The fix should be available in nightly >= 8-2-2019
@Thunderforge Did you get a change to try this out after the fix?
@Debashis9012 I have not had a chance to test it, and likely won't be able to any time soon.
This seems to be working as expected.
Tried first with gauge versions
Gauge version: 1.0.4
Commit Hash: 3a9a647
Plugins
-------
html-report (4.0.8)
js (2.3.5.nightly-2019-02-25)
screenshot (0.0.1)
The suite had approximately 80 specs and 120 scenarios.
I changed the plugin_kill_timeout to 500
(0.5 seconds).
When gauge run
was executed, the html-report plugin failed with Plugin [Html Report] with pid [27204] did not exit after 0.50 seconds. Forcefully killing it.
.
Later changed the Gauge version to
Gauge version: 1.0.5.nightly-2019-04-04
Commit Hash: f41dccf
Plugins
-------
html-report (4.0.8)
js (2.3.5.nightly-2019-02-25)
screenshot (0.0.1)
When gauge run
was executed, the html-report plugin succeeded in creating the reports.
@Thunderforge could you please give a try with the fix version and let us know whether its working for you or not?
@Debashis9012 Unfortunately, I am no longer in a position where I can test this. So please proceed with QA without me.
The fix should be available in nightly >= 26-4-2019
I tried to run with a large project in a fresh machine(both windows and mac) against the fix version 26-4-2019
Observation:
- At first run it works absolutely fine without any plugin kill time out error.
After first run I am observing the plugin kill time out error continuously. - If our plugin kill time out is set to
plugin_kill_timeout 4000
which is default value. After completion of execution it will take required amount of time to generate HTML report. - If we set
plugin_kill_timeout
to any other value except default value then it will immediately throw an error.
Kindly find the below console output:
Plugin [Html Report] with pid [9527] did not exit after 4.00 seconds. Forcefully killing it.
Specifications: 961 executed 961 passed 0 failed 699 skipped
Scenarios: 1919 executed 1919 passed 0 failed 1401 skipped
I don't see this issue with the latest(master) gauge and html-report.
So as of now closing this issue.