moby/datakit

CI goes into a select loop under load

avsm opened this issue · 2 comments

avsm commented

With a large number of jobs (around 2500 in the ci.ocaml.io case), the CI service goes into a loop with:

ci_1           | - Select: select, , Invalid argument
ci_1           | - Select: select, , Invalid argument
ci_1           | - Select: select, , Invalid argument
ci_1           | - Select: select, , Invalid argument
ci_1           | - Select: select, , Invalid argument
ci_1           | - Select: select, , Invalid argument
ci_1           | - Select: select, , Invalid argument
ci_1           | - Select: select, , Invalid argument
ci_1           | - Select: select, , Invalid argument

spammed on the console. Restarting the service continuing the jobs until it happens again. The backend is libev, and this Dockerfile is used:

FROM ocaml/opam:alpine_ocaml-4.04.0
RUN cd /tmp && curl -OL https://test.docker.com/builds/Linux/x86_64/docker-1.13.0-rc2.tgz && tar -zxvf docker-1.13.0-rc2.tgz docker/docker && sudo mv docker/docker /usr/bin && rm -f docker-1.13.0-rc2.tgz
RUN opam remote add dev git://github.com/mirage/mirage-dev
RUN opam pin add -n datakit git://github.com/docker/datakit
RUN opam pin add -n datakit-github git://github.com/docker/datakit
RUN opam pin add -n datakit-client git://github.com/docker/datakit
RUN opam pin add -n datakit-server git://github.com/docker/datakit
RUN opam pin add -n datakit-ci git://github.com/docker/datakit
RUN opam depext -uivy -j 4 datakit-ci conf-libev
ADD . /home/opam/src
RUN sudo chown -R opam /home/opam/src
RUN opam pin add -n mirage-ci /home/opam/src
RUN opam install -vy -j 4 mirage-ci
ENV CONDUIT_TLS=native
ENV OCAMLRUNPARAM=b
RUN opam config exec -- ocaml /home/opam/src/check-libev.ml
USER root
ENTRYPOINT ["/home/opam/.opam/4.04.0/bin/opamCI"]
CMD []

Still trying to get a backtrace for this, or pin down what actually emits this particular line.

avsm commented

Ah this might be a package build, not the CI

Trying

docker run -it ocaml/opam:alpine_ocaml-4.02.3 opam depext -iyv -j 2 biocaml

as the possible culprit

Closing, because the loop is from the package being tested, not from the CI. I've added #394 to track the issue of this being confusing.