Mutation tests failing
qstokkink opened this issue · 10 comments
It appears our mutation testing machine, zulu-ipv8-mutation-tester (ipv8-mutation-tester IPv8), is now missing a dependency:
00:11:18 + run_all_mutation_tests.py ./py-ipv8 .
00:11:18 /tmp/jenkins8324097251270560127.sh: 3: run_all_mutation_tests.py: not found
"The operation was a success, but the patient died":
12:01:23 java.nio.channels.ClosedChannelException
12:01:23 at org.jenkinsci.remoting.protocol.NetworkLayer.onRecvClosed(NetworkLayer.java:155)
12:01:23 at org.jenkinsci.remoting.protocol.impl.NIONetworkLayer.ready(NIONetworkLayer.java:143)
12:01:23 at org.jenkinsci.remoting.protocol.IOHub$OnReady.run(IOHub.java:789)
12:01:23 at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
12:01:23 at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
12:01:23 at jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
12:01:23 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
12:01:23 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
12:01:23 at java.base/java.lang.Thread.run(Thread.java:840)
12:01:23 Caused: java.io.IOException: Backing channel 'JNLP4-connect connection from <Server Name>/<Server IP>:<Server Port>' is disconnected.
12:01:23 at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:215)
12:01:23 at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
12:01:23 at jdk.proxy2/jdk.proxy2.$Proxy123.isAlive(Unknown Source)
12:01:23 at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1212)
12:01:23 at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1204)
12:01:23 at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:195)
12:01:23 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:145)
12:01:23 at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:92)
12:01:23 at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
12:01:23 at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:818)
12:01:23 at hudson.model.Build$BuildExecution.build(Build.java:199)
12:01:23 at hudson.model.Build$BuildExecution.doRun(Build.java:164)
12:01:23 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:526)
12:01:23 at hudson.model.Run.execute(Run.java:1895)
12:01:23 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:44)
12:01:23 at hudson.model.ResourceController.execute(ResourceController.java:101)
12:01:23 at hudson.model.Executor.run(Executor.java:442)
The build executed correctly for several hours, but the build executor lost connection due to another issue.
Agent restarted. Hopefully this was just a one time thing 🤞
Not a one time thing. The builder disconnected again. 😢
Perhaps we need to change the priority of the Jenkins agent jar to take priority over everything else.
Switched to nohup bash -c 'java -jar agent.jar etc etc' > test.txt 2>&1 </dev/null &
. Hopefully it stays online now. We'll see in a few hours.
🎉 The builder no longer disconnects. On to the next error:
12:40:54 Done! Minimizing output
12:40:54 Skipping /[...]/index.html, no index.html found!
12:40:54 Traceback (most recent call last):
12:40:54 File "/home/run_all_mutation_tests.py", line 116, in <module>
12:40:54 shutil.copy(os.path.join('/root', 'MutPy', 'mutpy', 'templates', 'include', 'jquery.js'), base_output_dir)
12:40:54 File "/usr/lib/python3.10/shutil.py", line 417, in copy
12:40:54 copyfile(src, dst, follow_symlinks=follow_symlinks)
12:40:54 File "/usr/lib/python3.10/shutil.py", line 254, in copyfile
12:40:54 with open(src, 'rb') as fsrc:
12:40:54 FileNotFoundError: [Errno 2] No such file or directory: '/root/MutPy/mutpy/templates/include/jquery.js'
Third error fixed. Second error is back: the builder is disconnecting again.
It did stay online while I had an active connection open to the container. Perhaps there is some sort of hibernation mode that triggers.
Based on https://community.jenkins.io/t/how-to-affect-ssh-parameters-on-ssh-agent-like-keep-alive/5954, we should probably try playing with the ~/.ssh/config
file. The posted example in the link above is:
Host *
ServerAliveInterval 60
ServerAliveCountMax 3
Our disconnecting job takes (just short of) 2 hours. Based only on gut feeling alone, setting the alive interval to 5 minutes and the max missing count to 24 should suffice. I'll try this out once I'm on the (physical) premises again and I have access to the machine.
To get a sense of perspective on Jenkins, I looked into GitHub Actions. At the time of writing, the maximum job execution time is 6 hours and a cron
build trigger exists. This means it would be theoretically feasible to use GitHub Actions for our nightly build.
That said, we would still have to create the action (MutPy
fork from my disgusting patches in the secret Tribler/py-ipv8-mutation-libraries
repository (
Practically speaking, it's probably still best to stick with Jenkins.
I have updated the agent to connect via SSH. Hopefully, it will not disconnect anymore.
Here is a running job: https://jenkins.tribler.org/job/ipv8/job/mutation_test_daily/21/