awslabs/soci-snapshotter

[FEATURE] Support concurrency limits when setting up layers

Kern-- opened this issue · 0 comments

Kern-- commented

Description

When pulling an image with Docker, Nerdctl, or Kubernetes, there is a per-image concurrency limiter which limits the number of inflight layer pulls.

When SOCI handles a layer, it unconditionally spawns a goroutine to handle all adjacent layers:

soci-snapshotter/fs/fs.go

Lines 427 to 447 in 21ec544

for _, desc := range neighboringLayers(preResolve.Manifest, preResolve.Target) {
desc := desc
go func() {
// Avoids to get canceled by client.
ctx := log.WithLogger(context.Background(), log.G(ctx).WithField("mountpoint", mountpoint))
sociDesc, ok := c.imageLayerToSociDesc[desc.Digest.String()]
if !ok {
log.G(ctx).WithError(snapshot.ErrNoZtoc).WithField("layerDigest", desc.Digest.String()).Debug("skipping layer pre-resolve")
return
}
l, err := fs.resolver.Resolve(ctx, preResolve.Hosts, preResolve.Name, desc, sociDesc, c.fuseOperationCounter, fs.disableVerification)
if err != nil {
log.G(ctx).WithError(err).Debug("failed to pre-resolve")
return
}
// Release this layer because this isn't target and we don't use it anymore here.
// However, this will remain on the resolver cache until eviction.
l.Done()
}()
}

For images with a large number of layers or a large number of concurrent image pulls, the SOCI snapshotter has no limit to the number of layers it will try to resolve. This could put pressure on the network, but more likely would put pressure on the metadata bboltdb and could cause pull times to grow indefinitely.

Describe the solution you'd like

There should be an option to limit the number of in-flight resolutions. It is an open question whether this should be a global limit or a per-image limit.

Describe any alternative solutions/features you've considered

No response

Any additional context or information about the feature request

An additional question is whether there is a race where if we're slow to set up an adjacent layer, can we end up resolving it again when handling the next layer?