armanbilge/epollcat

Prevent `SIGPIPE` signals

armanbilge opened this issue · 5 comments

Ha, can't say I wasn't warned! This is the bug that @lolgab stumbled on 😆

@LeeTibbert would you happen to know how to write a test for this? I've tried the obvious (wrong?) thing of writing to a socket whose peer is closed but that doesn't seem to trigger it.

  test("writing to closed socket".only) {
    IOServerSocketChannel.open.use { server =>
      IOSocketChannel.open.use { clientCh =>
        for {
          _ <- server.bind(new InetSocketAddress("localhost", 0))
          gate <- IO.deferred[Unit]
          _ <- (server.accept.use_ *> gate.complete(())).background.surround {
            for {
              addr <- server.localAddress
              _ <- clientCh.connect(addr)
              _ <- gate.get
              _ <- clientCh.write(ByteBuffer.wrap("Hello!".getBytes))
              _ <- IO.println("here")
            } yield ()
          }
        } yield ()
      }
    }
  }

Ahahaha this was enjoyable reading.

https://blog.erratasec.com/2018/10/tcpip-sockets-and-sigpipe.html

In this day and age of "continuous integration", programmers are interested not only in solving this in their code, but solving this in their unit/regression test suites. In the modern perspective, until you can create a test that exercises this bug, it's not truly fixed.

I'm not sure how to write code that adequately does this. It's not straightforward generating RSTs from the Sockets API, especially at the exact point you need them. There's also timing issues, where you may need to do something a million times repeatedly to just to get the timing right.

re:

would you happen to know how to write a test for this?
I've tried the obvious (wrong?) thing of writing to a socket whose peer is closed but that doesn't seem to trigger it.

Interesting & informative article at the URL you posted. Good to have that for future reference.
As they mentioned, network testing is hard. I think you did the correct thing, but not enough of it.

I suspect the key is the paragraph below:

For example, I have a sample program that calls send() as fast as it can until it hit the limit on how much this side can buffer, and then closes the socket, causing a reset to be sent. For my simple "echo" server trying to echo back everything it receives, this will cause a SIGPIPE condition.

For future reference:
There are socket options to reduce the size of the sending buffer on a socket. That, and a tight loop of a
thousand or so writes might provoke the bug.

At a first approximation, I believe that it is not worth investing the time at this stage of the project creating
such a test, especially where Lorenzo's code in the wild can provoke the condition.

Preaching to the choir.

To my thinking, code testing overall is a Bayesian "building confidence " exercise:
how can each additional piece of evidence increase confidence that the code is correct
(and, in theory, how much is the increase). Testing is a prime case of Parkinson's law:
it will always exceed the time allotted to it. And there is always an extra test which
could be done. Resource limitations require a project to find the "sweet spot"

  • Supportive code review by at least one other developer. In a rapid-prototyping
    environment, this might be trailing or after the fact of a merge. The reviewer
    might say "Beats me, but the formatting is good ;-)".

  • Having a good, reproducible CI test is the goal. The question should be asked before
    each merge.

  • There are situations where automated CI testing can not be done in economic time.
    I will talk about networking, because that is my area of experience. Some
    situations need the environment set up on a specific way. Some need to run
    for hours before the condition shows (think memtest for PCs). Some need
    manual use of specialist tools (using netstat to assure that IPv6 servers are
    listening on IPv6 wildcard and not just IPv4)

    I think in these cases the devo should write a short piece describing
    manual testing beyond CI (if any). Then the lower priority concern
    becomes how to make these tests reproducible when the next
    generation of devos arrive and the original devo is not available.

    I think this is a "bird in the hand, not perfection" area.

  • Testing can not "test-in" quality. It can increase confidence
    that upstream processes, such as design, have worked well.

  • The best information comes from real use: a series of
    releases to wider groups, alpha, beta, supported.
    I like the principle I learned from User Interface design:
    every new class of user discovers a new class of bugs.

There are socket options to reduce the size of the sending buffer on a socket. That, and a tight loop of a
thousand or so writes might provoke the bug.

Yes, I suspect this too. But I decided not to go on a wild goose chase 😁

Maybe in the future we can have some form of integration tests that can help exercise these sorts of issues.