google/crfs

stargzify: Pushing blobs fail with DIGEST_INVALID occasionally because of race

ktock opened this issue · 1 comments

ktock commented

Sometimes, stargzifying an image fails with DIGEST_INVALID.

$ ./stargzify -upgrade 127.0.0.1:5000/ubuntu:18.04
2019/10/13 14:37:24 No matching credentials were found, falling back on anonymous
2019/10/13 14:37:49 DIGEST_INVALID: provided digest did not match uploaded content; map[Digest:sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 Reason:map[]]

Especially when Writeing to the digester is slow, we can see this failure more frequently.

func (d *digester) Write(b []byte) (int, error) {
	time.Sleep(time.Duration(1) * time.Millisecond)
	n, err := d.h.Write(b)
	d.n += int64(n)
	return n, err
}

There is a race condition on the digester

The method layer.Compressed() calculates the digest(with Sum() method) immediately after the PipeWriter is unblocked.

func (l *layer) Compressed() (io.ReadCloser, error) {
	pr, pw := io.Pipe()

	// Convert input blob to stargz while computing diffid, digest, and size.
	go func() {
		w := stargz.NewWriter(pw)
		if err := w.AppendTar(l.rc); err != nil {
...
		if err := w.Close(); err != nil {
...
		l.digest = &v1.Hash{
			Algorithm: "sha256",
			Hex:       hex.EncodeToString(l.d.h.Sum(nil)),
		}
...
	}()

	return ioutil.NopCloser(io.TeeReader(pr, l.d)), nil

But even if the PipeWriter is unblocked, it isn't guaranteed that the TeeReader completes to write the data to the digester.

Accroding to the implementation of the TeeReader:

func (t *teeReader) Read(p []byte) (n int, err error) {
	n, err = t.r.Read(p)

        // ***** Region A *****

	if n > 0 {
		if n, err := t.w.Write(p[:n]); err != nil {
			return n, err
		}
	}
	return
}

Here, this TeeReader has the PipeReader as a reader and the digester as a writer.
In the Region A of the above, the write half(PipeWriter) is unblocked because reading from this PipeReader completes.
So the goroutine in the layer.Compressed() can immediately calculate the digest during the region A(before following t.w.Write() completes).
This race results in calculating an invalid(uncompleted) degest.

ktock commented

Fixed in #25. Thank you very much for merging.