golang/go

os: race between os.(*File).Close() and os.(*File).Read()

tarm opened this issue · 7 comments

tarm commented

Here's is a simple reproduction case operating on a regular file (in this case "race.go"). This is with go 1.4.2

package main

import (
    "log"
    "os"
)

func main() {
    f, err := os.Open("race.go")
    if err != nil {
        log.Fatal(err)
    }
    ch := make(chan struct{})
    go func() {
        f.Close()
        close(ch)
    }()
    b := make([]byte, 512)
    n, err := f.Read(b)
    log.Printf("Read %v, %s, %v", n, b[:n], err)
    <-ch
}

In this case, this would obviously be a silly thing to do because there is also a larger race condition in terms of how much data is actually read from the Read() call. The real case this came up was with a serial port device (https://github.com/tarm/goserial) where the Read() and Close() might happen in different goroutines because the Read() can take a long time (serial ports can be configured to block forever until the next character). A syscall.Close() will allow the Read() to return, which is why os.(*File).Close() and Read() are useful to be able to run in different goroutines.

The problematic line in file_unix.go:close() is

        file.fd = -1 // so it can't be closed again

Conceptually, does it make sense that Close() and Read() should be allowed to run at the same time? The cost of a mutex in Close() is not important, but putting a mutex in Read() might have a performance impact. Would a fix be appropriate for the std library? Should I try to work around this in the goserial library by not using os.File?

tarm@linux ~ $ go run -race race.go 
2015/02/25 10:55:52 Read 287, package main

import (
    "log"
    "os"
)

func main() {
    f, err := os.Open("race.go")
    if err != nil {
        log.Fatal(err)
    }
    ch := make(chan struct{})
    go func() {
        f.Close()
        close(ch)
    }()
    b := make([]byte, 512)
    n, err := f.Read(b)
    log.Printf("Read %v, %s, %v", n, b[:n], err)
    <-ch
}, <nil>
==================
WARNING: DATA RACE
Write by goroutine 5:
  os.(*file).close()
      /home/tarm/gosrc/src/os/file_unix.go:109 +0x1d7
  os.(*File).Close()
      /home/tarm/gosrc/src/os/file_unix.go:98 +0x91
  main.func·001()
      /home/tarm/race.go:15 +0x53

Previous read by main goroutine:
  os.(*File).read()
      /home/tarm/gosrc/src/os/file_unix.go:191 +0x5f
  os.(*File).Read()
      /home/tarm/gosrc/src/os/file.go:95 +0xc3
  main.main()
      /home/tarm/race.go:19 +0x32a

Goroutine 5 (running) created at:
  main.main()
      /home/tarm/race.go:17 +0x29a
==================
Found 1 data race(s)
exit status 66

As I understand it, POSIX does not guarantee the result of calling read(2) in one thread and close(2) in another. This is effectively what is happening here.

tarm commented

Changing Close() to not write to file.fd yet still only close once per the comment is pretty straightforward (see below). However, setting file.fd to -1 has the positive side affect that file operations after Close() will not be using a stale fd at the syscall level, so we should keep the -1 assignment unless we want to take the file.closedMutex every operation which defeats the purpose.

diff --git a/src/os/file_unix.go b/src/os/file_unix.go
index ff4fc7d..1f31860 100644
--- a/src/os/file_unix.go
+++ b/src/os/file_unix.go
@@ -8,6 +8,7 @@ package os

 import (
        "runtime"
+       "sync"
        "sync/atomic"
        "syscall"
 )
@@ -26,6 +27,9 @@ type file struct {
        name    string
        dirinfo *dirInfo // nil unless directory being read
        nepipe  int32    // number of consecutive EPIPE in Write
+
+       closeMutex sync.Mutex
+       closed     bool
 }

 // Fd returns the integer Unix file descriptor referencing the open file.
@@ -99,14 +103,22 @@ func (f *File) Close() error {
 }

 func (file *file) close() error {
-       if file == nil || file.fd < 0 {
+       if file == nil {
+               return syscall.EINVAL
+       }
+
+       file.closeMutex.Lock()
+       alreadyClosed := file.closed
+       file.closed = true
+       file.closeMutex.Unlock()
+
+       if alreadyClosed {
                return syscall.EINVAL
        }
        var err error
        if e := syscall.Close(file.fd); e != nil {
                err = &PathError{"close", file.name, e}
        }
-       file.fd = -1 // so it can't be closed again

        // no need for a finalizer anymore
        runtime.SetFinalizer(file, nil)

Imagine the following three events happening one after the other:

1. f1.Close()
2. f2 = os.Open()  // Same file descriptor could well be reused
3. f1.Read()

In that case, the call to f1.Read() would actually be reading from an invalid file descriptor. Setting the internal .fd property to -1 prevents that situation.

I am not disputing that there is a race. What I am suggesting is fixing
this race gives the illusion that concurrent read/close works, and POSIX
doesn't guarantee this.

On Thu, Feb 26, 2015 at 7:41 AM, Alvaro Lopez Ortega <
notifications@github.com> wrote:

Imagine the following three events happening one after the other:

  1. f1.Close()
  2. f2 = os.Open() // Same file descriptor could well be reused
  3. f1.Read()

In that case, the call to f1.Read() would actually be reading from an
invalid file descriptor. Setting the internal .fd property to -1 prevents
that situation.


Reply to this email directly or view it on GitHub
#10001 (comment).

tarm commented

@davecheney FYI my first reply was not meant as a reply to your first comment. Looking into it more I agree with you.

From the linux man page:

It is probably unwise to close file descriptors while they may be in use by system calls in other threads in the same process. Since a file descriptor may be reused, there are some obscure race conditions that may cause unintended side effects.

I will fix this at a higher level in the serial library. Closing.

The problem here is not with file.fd, but with the actual file. Whenever you call Close concurrently with Read/Write you are risking of corrupting arbitrary file or leaking sensitive data to an arbitrary socket.

@tarm on a closing note, this issue is related to #6222. If all fd io in Go went through a poller, not just the network stuff, then we could have real async close.