jenkinsci/docker-plugin

Sometimes 100% CPU usage at the end of a job when the remote docker container has finished running

Z9n2JktHlZDmlhSvqc9X2MmL3BwQG7tk opened this issue · 6 comments

Jenkins and plugins versions report

Environment
Jenkins: 2.375.2
OS: Linux - 4.15.0-202-generic
---
ace-editor:1.1
ant:481.v7b_09e538fcca
antisamy-markup-formatter:155.v795fb_8702324
apache-httpcomponents-client-4-api:4.5.13-138.v4e7d9a_7b_a_e61
artifactory:3.18.0
authentication-tokens:1.4
authorize-project:1.4.0
basic-branch-build-strategies:71.vc1421f89888e
bootstrap4-api:4.6.0-5
bootstrap5-api:5.2.1-3
bouncycastle-api:2.27
branch-api:2.1071.v1a_188a_562481
build-discarder:139.v05696a_7fe240
build-timeout:1.28
caffeine-api:2.9.3-65.v6a_47d0f4d1fe
checks-api:1.8.1
cloudbees-folder:6.800.v71307ca_b_986b
collapsing-console-sections:1.8.0
command-launcher:90.v669d7ccb_7c31
commons-lang3-api:3.12.0-36.vd97de6465d5b_
commons-text-api:1.10.0-27.vb_fa_3896786a_7
config-file-provider:3.11.1
configuration-as-code:1569.vb_72405b_80249
configuration-as-code-groovy:1.1
credentials:1214.v1de940103927
credentials-binding:523.vd859a_4b_122e6
data-tables-api:1.12.1-4
disable-github-multibranch-status:1.2
display-url-api:2.3.7
docker-commons:1.21
docker-java-api:3.2.13-37.vf3411c9828b9
docker-plugin:1.3.0
docker-workflow:563.vd5d2e5c4007f
dtkit-api:3.0.2
durable-task:504.vb10d1ae5ba2f
echarts-api:5.4.0-1
email-ext:2.93
extensible-choice-parameter:1.8.0
extra-columns:1.25
font-awesome-api:6.2.1-1
git:5.0.0
git-client:4.1.0
git-parameter:0.9.18
git-server:99.va_0826a_b_cdfa_d
github:1.36.1
github-api:1.303-400.v35c2d8258028
github-branch-source:1701.v00cc8184df93
github-scm-trait-notification-context:1.1
gradle:2.2
h2-api:1.4.199
hidden-parameter:0.0.5
http_request:1.16
inline-pipeline:1.0.2
instance-identity:142.v04572ca_5b_265
ionicons-api:31.v4757b_6987003
ivy:2.4
jackson2-api:2.14.1-313.v504cdd45c18b
jakarta-activation-api:2.0.1-2
jakarta-mail-api:2.0.1-2
javadoc:226.v71211feb_e7e9
javax-activation-api:1.2.0-5
javax-mail-api:1.6.2-8
jaxb:2.3.7-1
jdk-tool:63.v62d2fd4b_4793
jjwt-api:0.11.5-77.v646c772fddb_0
jnr-posix-api:3.1.16-1
job-dsl:1.81
jobConfigHistory:1198.v4d5736c2308c
jquery:1.12.4-1
jquery-detached:1.2.1
jquery3-api:3.6.1-2
jsch:0.1.55.61.va_e9ee26616e7
junit:1166.va_436e268e972
junit-attachments:131.vef2117770555
junit-realtime-test-reporter:90.v7dc2c937f9a_8
ldap:659.v8ca_b_a_fe79fa_d
lockable-resources:1113.vf66761f17f1e
mailer:438.v02c7f0a_12fa_4
mapdb-api:1.0.9-28.vf251ce40855d
matrix-auth:3.1.6
matrix-project:785.v06b_7f47b_c631
maven-plugin:3.20
mina-sshd-api-common:2.9.2-50.va_0e1f42659a_a
mina-sshd-api-core:2.9.2-50.va_0e1f42659a_a
momentjs:1.1.1
next-executions:1.0.15
nodejs:1.5.1
okhttp-api:4.9.3-108.v0feda04578cf
pam-auth:1.10
pipeline-build-step:2.18
pipeline-config-history:1.6
pipeline-github-lib:38.v445716ea_edda_
pipeline-graph-analysis:202.va_d268e64deb_3
pipeline-groovy-lib:629.vb_5627b_ee2104
pipeline-input-step:466.v6d0a_5df34f81
pipeline-maven:1257.v89e586d3c58c
pipeline-milestone-step:111.v449306f708b_7
pipeline-model-api:2.2118.v31fd5b_9944b_5
pipeline-model-definition:2.2118.v31fd5b_9944b_5
pipeline-model-extensions:2.2118.v31fd5b_9944b_5
pipeline-rest-api:2.30
pipeline-stage-step:305.ve96d0205c1c6
pipeline-stage-tags-metadata:2.2118.v31fd5b_9944b_5
pipeline-stage-view:2.30
pipeline-utility-steps:2.15.0
plain-credentials:143.v1b_df8b_d3b_e48
plugin-util-api:2.20.0
popper-api:1.16.1-3
popper2-api:2.11.6-2
resource-disposer:0.20
scm-api:631.v9143df5b_e4a_a
script-security:1229.v4880b_b_e905a_6
simple-theme-plugin:136.v23a_15f86c53d
slack:631.v40deea_40323b
snakeyaml-api:1.33-90.v80dcb_3814d35
ssh-agent:295.v9ca_a_1c7cc3a_a_
ssh-credentials:305.v8f4381501156
ssh-slaves:2.854.v7fd446b_337c9
sshd:3.275.v9e17c10f2571
structs:324.va_f5d6774f3a_d
subversion:2.17.0
test-results-analyzer:0.4.0
timestamper:1.21
token-macro:321.vd7cc1f2a_52c8
trilead-api:2.84.v72119de229b_7
uno-choice:2.6.4
variant:59.vf075fe829ccb
view-job-filters:2.3
workflow-aggregator:590.v6a_d052e5a_a_b_5
workflow-api:1208.v0cc7c6e0da_9e
workflow-basic-steps:994.vd57e3ca_46d24
workflow-cps:3606.v0b_d8b_e512dcf
workflow-durable-task-step:1223.v7f1a_98a_8863e
workflow-job:1268.v6eb_e2ee1a_85a
workflow-multibranch:716.vc692a_e52371b_
workflow-scm-step:400.v6b_89a_1317c9a_
workflow-step-api:639.v6eca_cd8c04a_a_
workflow-support:839.v35e2736cfd5c
ws-cleanup:0.44
xunit:3.1.2

What Operating System are you using (both controller, and any agents involved in the problem)?

Controller

Linux Ubuntu 18.04.6 LTS
Docker version 20.10.12, build e91ed57
Jenkins running as docker container from image jenkins/jenkins:lts-alpine

Agent

Linux Ubuntu 20.04.4 LTS
Docker version 20.10.14, build a224086
Connection "Jenkins -> agent docker" is made via TCP with SSL cert auth

Reproduction steps

Just run jobs (pipeline) and problem will happen after some time.

Expected Results

No high CPU load.

Actual Results

Endless constant 100% CPU load on Jenkins node until Jenkins is restarted. Although Jenkins continues to work and execute jobs.

Anything else?

Not sure, but it looks like the problem is here - https://github.com/jenkinsci/docker-plugin/blob/master/src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java#L50
Infinite loop ?

The problem occurs several times a day.

Jenkins log:

Jan 27 13:42:00 jenkins jenkins[9360]: INFO     c.n.j.p.docker.DockerTemplate#doProvisionNode: Trying to run container for node java11-00000f5vevv56 from image: jenkins-slave:java11
Jan 27 13:42:00 jenkins jenkins[9360]: INFO     c.n.j.p.docker.DockerTemplate#doProvisionNode: Started container ID c95594a86d8f093f9548dd88a04cf2edde7380778bf209fd1132905447ceb16f for node java11-00000f5vevv56 from image: jenkins-slave:java11
Jan 27 13:42:01 jenkins jenkins[9360]: INFO     i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from java11-00000f5vevv56 (c95594a86d8f093f9548dd88a04cf2edde7380778bf209fd1132905447ceb16f): Jan 27, 2023 1:42:01 PM org.jenkinsci.remoting.engine.WorkDirManager setupLogging
Jan 27 13:42:01 jenkins jenkins[9360]: INFO     i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from java11-00000f5vevv56 (c95594a86d8f093f9548dd88a04cf2edde7380778bf209fd1132905447ceb16f): channel started
Jan 27 13:42:02 jenkins jenkins[9360]: INFO     i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from java11-00000f5vevv56 (c95594a86d8f093f9548dd88a04cf2edde7380778bf209fd1132905447ceb16f): WARNING: An illegal reflective access operation has occurred
Jan 27 13:42:08 jenkins jenkins[9360]: INFO     i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from java11-00000f5vevv56 (c95594a86d8f093f9548dd88a04cf2edde7380778bf209fd1132905447ceb16f): SLF4J: No SLF4J providers were found.
Jan 27 13:42:08 jenkins jenkins[9360]: INFO     i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from java11-00000f5vevv56 (c95594a86d8f093f9548dd88a04cf2edde7380778bf209fd1132905447ceb16f): SLF4J: Defaulting to no-operation (NOP) logger implementation
Jan 27 13:42:08 jenkins jenkins[9360]: INFO     i.j.d.c.DockerMultiplexedInputStream#readInternal: stderr from java11-00000f5vevv56 (c95594a86d8f093f9548dd88a04cf2edde7380778bf209fd1132905447ceb16f): SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.
Jan 27 13:48:55 jenkins jenkins[9360]: INFO     i.j.docker.DockerTransientNode$1#println: Disconnected computer for node 'java11-00000f5vevv56'.
Jan 27 13:48:55 jenkins jenkins[9360]: INFO     i.j.docker.DockerTransientNode$1#println: Removed Node for node 'java11-00000f5vevv56'.
Jan 27 13:49:06 jenkins jenkins[9360]: INFO     i.j.docker.DockerTransientNode$1#println: Stopped container 'c95594a86d8f093f9548dd88a04cf2edde7380778bf209fd1132905447ceb16f' for node 'java11-00000f5vevv56'.
Jan 27 13:49:14 jenkins jenkins[9360]: INFO     i.j.docker.DockerTransientNode$1#println: Removed container 'c95594a86d8f093f9548dd88a04cf2edde7380778bf209fd1132905447ceb16f' for node 'java11-00000f5vevv56'.

VisualVM thread dump:

"Channel reader thread: java11-00000f5vevv56" - Thread t@159
   java.lang.Thread.State: RUNNABLE
        at io.jenkins.docker.client.DockerMultiplexedInputStream.readInternal(DockerMultiplexedInputStream.java:51)
        at io.jenkins.docker.client.DockerMultiplexedInputStream.read(DockerMultiplexedInputStream.java:40)
        at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:102)
        at hudson.remoting.ChunkedInputStream.read(ChunkedInputStream.java:48)
        at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:99)
        at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)

   Locked ownable synchronizers:
        - None

Added more logging to code:

--- src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-02-21 12:05:27.519660828 +0000
+++ src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-03-07 11:04:12.222995540 +0000
@@ -39,6 +39,7 @@
     public int read(byte[] b, int off, int len) throws IOException {
         readInternal();
         int byteRead = multiplexed.read(b, off, Math.min(next, len));
+        LOGGER.info("[1]: byteRead={} next={} off={} len={} | {}", byteRead, next, off, len, name);
         next -= byteRead;
         return byteRead;
     }
@@ -49,6 +50,7 @@
             int todo = 8;
             while (todo > 0) {
                 int i = multiplexed.read(header, 8-todo, todo);
+                LOGGER.info("[2]: i={} todo={} | {}", i, todo, name);
                 if (i < 0) return; // EOF
                 todo -= i;
             }
@@ -63,6 +65,7 @@
                     int received = 0;
                     while (received < size) {
                         int i = multiplexed.read(payload, received, size - received);
+                        LOGGER.info("[3]: i={} received={} | {}", i, received, name);
                         if (i < 0) break; // EOF
                         received += i;
                     }

and got following messages when "100% CPU" problem came again:

INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8       | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read:         [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8       | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read:         [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8       | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read:         [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8       | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read:         [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8       | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read:         [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8       | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read:         [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8       | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read:         [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8       | java-multi-000qefydty62o (2cf8a1...6ed718)
INFO i.j.d.c.DockerMultiplexedInputStream#read:         [1]: byteRead=0 next=0 | java-multi-000qefydty62o (2cf8a1...6ed718)

VisualVM thread dump:

"Channel reader thread: java-multi-000qefydty62o" - Thread t@28548
   java.lang.Thread.State: RUNNABLE
        at java.base@11.0.18/java.io.FileOutputStream.writeBytes(Native Method)
        at java.base@11.0.18/java.io.FileOutputStream.write(Unknown Source)
        at java.base@11.0.18/java.io.BufferedOutputStream.write(Unknown Source)
        - locked <3b140a2a> (a java.io.BufferedOutputStream)
        at java.base@11.0.18/java.io.PrintStream.write(Unknown Source)
        - locked <2d5aa34c> (a java.io.PrintStream)
        at java.base@11.0.18/sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)
        at java.base@11.0.18/sun.nio.cs.StreamEncoder.implFlushBuffer(Unknown Source)
        at java.base@11.0.18/sun.nio.cs.StreamEncoder.implFlush(Unknown Source)
        at java.base@11.0.18/sun.nio.cs.StreamEncoder.flush(Unknown Source)
        - locked <278b9bac> (a java.io.OutputStreamWriter)
        at java.base@11.0.18/java.io.OutputStreamWriter.flush(Unknown Source)
        at java.logging@11.0.18/java.util.logging.StreamHandler.flush(Unknown Source)
        - locked <158bbe21> (a java.util.logging.ConsoleHandler)
        at java.logging@11.0.18/java.util.logging.ConsoleHandler.publish(Unknown Source)
        at java.logging@11.0.18/java.util.logging.Logger.log(Unknown Source)
        at org.slf4j.jul.JDK14LoggerAdapter.innerNormalizedLoggingCallHandler(JDK14LoggerAdapter.java:156)
        at org.slf4j.jul.JDK14LoggerAdapter.handleNormalizedLoggingCall(JDK14LoggerAdapter.java:140)
        at org.slf4j.helpers.AbstractLogger.handleArgArrayCall(AbstractLogger.java:403)
        at org.slf4j.helpers.AbstractLogger.info(AbstractLogger.java:221)
        at io.jenkins.docker.client.DockerMultiplexedInputStream.read(DockerMultiplexedInputStream.java:42)
        at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:102)
        at hudson.remoting.ChunkedInputStream.read(ChunkedInputStream.java:48)
        at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:99)
        at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
        at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
        at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)

As we see the problem is in endless byteRead=0. Looked at the chain in thread dump we see that we can break infinite loop by returning -1 to readUntilBreak() (https://github.com/jenkinsci/remoting/blob/master/src/main/java/hudson/remoting/ChunkedInputStream.java#L97-L103)

            if (remaining>0) {
                // more bytes to read in the current chunk
                int read = read(buf, 0, Math.min(remaining,buf.length));
                if (read==-1)
                    throw new IOException("Unexpected EOF");
                sink.write(buf,0,read);
            } else {

So I added the conditions for read() result == 0:

--- src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java    2023-02-21 12:05:27.519660828 +0000
+++ src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java    2023-02-27 18:53:16.237851633 +0000
@@ -39,6 +39,8 @@
     public int read(byte[] b, int off, int len) throws IOException {
         readInternal();
         int byteRead = multiplexed.read(b, off, Math.min(next, len));
+        LOGGER.info("[1]: byteRead={} next={} off={} len={} | {}", byteRead, next, off, len, name);
+        if (byteRead <= 0) return -1;
         next -= byteRead;
         return byteRead;
     }
@@ -49,7 +51,8 @@
             int todo = 8;
             while (todo > 0) {
                 int i = multiplexed.read(header, 8-todo, todo);
-                if (i < 0) return; // EOF
+                LOGGER.info("[2]: i={} todo={} | {}", i, todo, name);
+                if (i <= 0) return; // EOF
                 todo -= i;
             }
             int size = ((header[4] & 0xff) << 24) + ((header[5] & 0xff) << 16) + ((header[6] & 0xff) << 8) + (header[7] & 0xff);
@@ -63,6 +66,7 @@
                     int received = 0;
                     while (received < size) {
                         int i = multiplexed.read(payload, received, size - received);
+                        LOGGER.info("[3]: i={} received={} | {}", i, received, name);
                         if (i < 0) break; // EOF
                         received += i;

And everything worked out ! No more any "100% CPU" problems (tested for a week).
Now log looks like:

INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=8 todo=8                          | java-multi-0002cbq0w9xea (5379ff...345375)
INFO i.j.d.c.DockerMultiplexedInputStream#read:         [1]: byteRead=574 next=574 off=0 len=574 | java-multi-0002cbq0w9xea (5379ff...345375)
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=8 todo=8                          | java-multi-0002cbq0w9xea (5379ff...345375)
INFO i.j.docker.DockerTransientNode$1#println: Disconnected computer for node 'java-multi-0002cbq0w9xea'.
INFO i.j.docker.DockerTransientNode$1#println: Removed Node for node 'java-multi-0002cbq0w9xea'.
INFO i.j.d.c.DockerMultiplexedInputStream#readInternal: [2]: i=-1 todo=8                         | java-multi-0002cbq0w9xea (5379ff...345375)
INFO i.j.d.c.DockerMultiplexedInputStream#read:         [1]: byteRead=0 next=0 off=0 len=1177    | java-multi-0002cbq0w9xea (5379ff...345375)
INFO h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel java-multi-0002cbq0w9xea
java.io.IOException: Unexpected EOF
at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:101)
at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:61)

I am not a Java developer, but I propose to consider this patch (at least it works for me):

diff orig/src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java patched/src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java
--- src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-02-21 12:05:27.519660828 +0000
+++ src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-03-07 11:41:24.105315705 +0000
@@ -39,6 +39,7 @@
     public int read(byte[] b, int off, int len) throws IOException {
         readInternal();
         int byteRead = multiplexed.read(b, off, Math.min(next, len));
+        if (byteRead <= 0) return -1;
         next -= byteRead;
         return byteRead;
     }
@@ -49,7 +50,7 @@
             int todo = 8;
             while (todo > 0) {
                 int i = multiplexed.read(header, 8-todo, todo);
-                if (i < 0) return; // EOF
+                if (i <= 0) return; // EOF
                 todo -= i;
             }
             int size = ((header[4] & 0xff) << 24) + ((header[5] & 0xff) << 16) + ((header[6] & 0xff) << 8) + (header[7] & 0xff);
@@ -63,7 +64,7 @@
                     int received = 0;
                     while (received < size) {
                         int i = multiplexed.read(payload, received, size - received);
-                        if (i < 0) break; // EOF
+                        if (i <= 0) break; // EOF
                         received += i;
                     }
                     if (LOGGER.isInfoEnabled()) {

I'm not longer maintaining this plugin but, personally, I'd be somewhat wary of this change.
There's a big difference in meaning between -1 and 0. The former means EOF. The latter means we've either got nothing to read, or that we asked to read nothing and so nothing was read.
I suspect that changing from < 0 to <= 0 would simply swap one set of trouble (which you're experiencing, but few others seem to be) for another (which you aren't, but others might).

Looking at the code, I suspect that the problem steps from readInternal not returning a value indicating "EOF has been reached" (or not), allowing a potential failure condition where next is zero but int i = multiplexed.read(header, 8-todo, todo) hits EOF and returns -1, meaning that read(...)'s call to int byteRead = multiplexed.read(b, off, Math.min(next, len)); actually passes (b, off, 0) to multiplexed.read(...) ... meaning that it's valid to immediately return 0 (see InputStream.read API) irrespective of whether or not the stream has reached EOF or not.

IMO a better (more robust) fix would be for readInternal() to return boolean, and to return true if it's OK to continue or false if we hit EOF, and then have the two places which call it return -1 if readInternal() returns false.

i.e.
The two calls to readInternal(); both become if (!readInternal()) return -1; // EOF reading header
The method declaration private void readInternal() throws IOException { becomes private boolean readInternal() throws IOException {
Inside the readInternal method itself, both lines saying if (i < 0) return; // EOF become if (i < 0) return false; // EOF
...and we add a return true; to the end of the readInternal method.

...and, ideally, add a unit-test to verify that both the class' read methods will return -1 if the InputStream ends during the 8 bytes of header even if InputStream.read(...,...,0) returns 0 and not -1.

Yes, you are right. Recreated patch according to your recomendations. Currently it works well, will wait for some time (to test it in real work for a long period).

--- src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-02-21 12:05:27.519660828 +0000
+++ src/main/java/io/jenkins/docker/client/DockerMultiplexedInputStream.java 2023-03-09 13:29:41.080690191 +0000
@@ -30,26 +30,26 @@
 
     @Override
     public int read() throws IOException {
-        readInternal();
+        if (!readInternal()) return -1; // EOF reading header
         next--;
         return multiplexed.read();
     }

     @Override
     public int read(byte[] b, int off, int len) throws IOException {
-        readInternal();
+        if (!readInternal()) return -1; // EOF reading header
         int byteRead = multiplexed.read(b, off, Math.min(next, len));
-        next -= byteRead;
+        if (byteRead > 0) next -= byteRead;
         return byteRead;
     }

-    private void readInternal() throws IOException {
+    private boolean readInternal() throws IOException {
         while (next == 0) {
             byte[] header = new byte[8];
             int todo = 8;
             while (todo > 0) {
                 int i = multiplexed.read(header, 8-todo, todo);
-                if (i < 0) return; // EOF
+                if (i < 0) return false; // EOF
                 todo -= i;
             }
             int size = ((header[4] & 0xff) << 24) + ((header[5] & 0xff) << 16) + ((header[6] & 0xff) << 8) + (header[7] & 0xff);
@@ -78,6 +78,7 @@
                     throw new IOException("Unexpected application/vnd.docker.raw-stream frame type " + Arrays.toString(header));
             }
         }
+        return true;
     }

 }
basil commented

Would be great to have a pull request for this along with an explanation of how this change has been tested.