com-lihaoyi/os-lib

Timeout doesn't work with pathological child process--possible deadlock?

acruise opened this issue · 3 comments

Hi there, I'm using os-lib_2.13:0.3.0 in an automated student assignment evaluation system... We need to compile their C code and run the binaries to check the output, and sometimes they're... bad.

I have a particular student program that hangs early in its execution--I can fix the program, but that's not the point--students' programs will be doing all kinds of damage, and I need os.proc to kill them if they hang. :)

The last two lines of the strace of this student program are:

fstat(1, {st_dev=makedev(0, 0x18), st_ino=3, st_mode=S_IFCHR|0600, st_nlink=1, st_uid=1000, st_gid=5, st_blksize=1024, st_blocks=0, st_rdev=makedev(0x88, 0), st_atime=1571542425 /* 2019-10-19T20:33:45.542742043-0700 */, st_atime_nsec=542742043, st_mtime=1571542425 /* 2019-10-19T20:33:45.542742043-0700 */, st_mtime_nsec=542742043, st_ctime=1571530834 /* 2019-10-19T17:20:34.582741621-0700 */, st_ctime_nsec=582741621}) = 0
futex(0xa01490, FUTEX_WAIT_PRIVATE, 2, NULL

When I run this program as a subprocess in os.proc, there are three threads that are locked forever, here's the top of their stack traces:

main thread:

"main" #1 prio=5 os_prio=0 tid=0x00007ff7dc04e800 nid=0x678c in Object.wait() [0x00007ff7e2876000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	- waiting on <0x000000071f35bc08> (a java.lang.UNIXProcess)
	at java.lang.Object.wait(Object.java:502)
	at java.lang.UNIXProcess.waitFor(UNIXProcess.java:395)
	- locked <0x000000071f35bc08> (a java.lang.UNIXProcess)
	at os.SubProcess.waitFor(SubProcess.scala:56)
	at os.proc.run$1(ProcessOps.scala:182)
	at os.proc.stream(ProcessOps.scala:186)
	at os.proc.call(ProcessOps.scala:70)

one of the I/O threads:

"Thread-12" #24 prio=5 os_prio=0 tid=0x00007ff7dd417000 nid=0x67c8 runnable [0x00007ff7bdffb000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:255)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	- locked <0x000000071f35df28> (a java.lang.UNIXProcess$ProcessPipeInputStream)
	at os.SubProcess$OutputStream.read(SubProcess.scala:177)
	- locked <0x000000071f362260> (a os.SubProcess$OutputStream)
	at os.SubProcess$OutputStream.read(SubProcess.scala:165)
	- locked <0x000000071f362260> (a os.SubProcess$OutputStream)
	at os.Internals$.transfer0(Internals.scala:17)
	at os.proc$$anon$2.run(ProcessOps.scala:132)
	at java.lang.Thread.run(Thread.java:748)

the other I/O thread:

"Thread-11" #23 prio=5 os_prio=0 tid=0x00007ff7dd415000 nid=0x67c7 runnable [0x00007ff7bdefa000]
   java.lang.Thread.State: RUNNABLE
	at java.io.FileInputStream.readBytes(Native Method)
	at java.io.FileInputStream.read(FileInputStream.java:255)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
	- locked <0x000000071f35be68> (a java.lang.UNIXProcess$ProcessPipeInputStream)
	at os.SubProcess$OutputStream.read(SubProcess.scala:177)
	- locked <0x000000071f3600e0> (a os.SubProcess$OutputStream)
	at os.SubProcess$OutputStream.read(SubProcess.scala:165)
	- locked <0x000000071f3600e0> (a os.SubProcess$OutputStream)
	at os.Internals$.transfer0(Internals.scala:17)
	at os.proc$$anon$1.run(ProcessOps.scala:122)
	at java.lang.Thread.run(Thread.java:748)

this is the bit of my code that's running the subprocess:

  private def run_!(command: List[String], cwd: Option[File]): (Int, ByteString, ByteString) = {
    val proc = os.proc(command)

    val result = proc.call(
      cwd = cwd.map(os.Path(_)).orNull,
      timeout=5000L,
      check=false
    )

    (result.exitCode, ByteString(result.out.bytes), ByteString(result.err.bytes))
  }

Much as it pains me to have to say this: commons-exec works fine here; it kills even the most stubborn subprocesses after the timeout elapses.

  private def run_!(command: List[String], cwd: Option[File]): (Int, ByteString, ByteString) = {
    val out = new ByteArrayOutputStream()
    val err = new ByteArrayOutputStream()

    val exec = new DefaultExecutor()
    cwd.foreach(exec.setWorkingDirectory)
    exec.setWatchdog(new ExecuteWatchdog(5000L))
    exec.setStreamHandler(new PumpStreamHandler(out, err))
    exec.setExitValues(null)

    val cmd = new CommandLine(command.head)
    cmd.addArguments(command.tail.toArray)
    val exit = exec.execute(cmd)
    val stdout = out.toByteArray
    val stderr = err.toByteArray

    (exit, ByteString(stdout), ByteString(stderr))
  }

Wouldn't be surprised if there was some misbehavior in there. As a workaround, you could probably use os.proc.spawn, Thread.sleep, and .destroy

should be fixed in master I think