buildbarn/bb-deployments

docker-compose: build fails if blob size exceed 227 MiB

snizovtsev opened this issue · 11 comments

Hello,

I've started using buildbarn with docker-compose setup to facilitate hermetic builds in my bazel project. After few days of hassle-free experience I encountered with error:

Failed to store previous blob ...: Shard 0: Blob is 296016884 bytes in size, while this backend is only capable of storing blobs of up to 238608384 bytes in size

I see 2 issues here:

  1. Its not obvious how to increase this limit. I tried to bump some constants in config/storage.jsonnet but it won't help. According to bb-storage source it should be equal to int64(sectorSizeBytes)*blockSectorCount but none of this constants present in example configuration. 238608384 is not even divisible by 2^20.

  2. I think 227 MiB too small to be a default blob size limit. Its common for C++ projects that binaries with debug symbols may weight in hundred of megabytes.

Thank you!

Increasing this setting from 8 GiB to 16 helps:

 contentAddressableStorage: {
    backend: {
      'local': {
        blocksOnBlockDevice: {
          source: {
            file: {
              path: '/storage-cas/blocks',
              sizeBytes: 16 * 1024 * 1024 * 1024,
            },
          },
        },
      },
    },

Hi @snizovtsev, I'm seeing the same thing, did you figure out a solution to this?

@SinOverCos Yes, I've made the following changes to my config:

diff --git a/docker-compose/config/storage.jsonnet b/docker-compose/config/storage.jsonnet
index 3036251..6ca0175 100644
--- a/docker-compose/config/storage.jsonnet
+++ b/docker-compose/config/storage.jsonnet
@@ -25,7 +25,7 @@ local common = import 'common.libsonnet';
           source: {
             file: {
               path: '/storage-cas/blocks',
-              sizeBytes: 8 * 1024 * 1024 * 1024,
+              sizeBytes: 32 * 1024 * 1024 * 1024,
             },
           },
           spareBlocks: 3,
@@ -52,7 +52,7 @@ local common = import 'common.libsonnet';
         keyLocationMapMaximumGetAttempts: 8,
         keyLocationMapMaximumPutAttempts: 32,
         oldBlocks: 8,
-        currentBlocks: 24,
+        currentBlocks: 48,
         newBlocks: 1,
         blocksOnBlockDevice: {
           source: {

I'm keeping the issue open since the default limits looks too small for me.

Looking at storage.jsonnet, one can read

        oldBlocks: 8,
        currentBlocks: 24,
        newBlocks: 1,
        blocksOnBlockDevice: {
          source: {
            file: {
              path: '/storage-cas/blocks',
              sizeBytes: 8 * 1024 * 1024 * 1024,
            },
          },
          spareBlocks: 3,
      }

The storage is split into 8+24+1+3=36 blocks. 8 GiB / 36 = 238609294 bytes which rounded down to 4 kiB cluster size gives int(238609294 / 4096) * 4096 = 238608384.

Basically, increase the sizeBytes field if you would like a larger cache. It is possible to adjust the number of blocks, but I don't see any reason to do so unless you really want to do dig deep into the details.

Thanks @snizovtsev @moroten

After made these changes, I am still seeing the same error:

Caused by: com.google.devtools.build.lib.remote.ExecutionStatusException: INVALID_ARGUMENT: I/O error while running command: Failed to read from 1-6dc48734830f0cef535eebde2aa4dc047a6e00f21ef70bf9ea5df8227e5b2cb1-309757619-fuse at offset 309690368: Failed to replicate blob 1-6dc48734830f0cef535eebde2aa4dc047a6e00f21ef70bf9ea5df8227e5b2cb1-309757619-fuse: 1-6dc48734830f0cef535eebde2aa4dc047a6e00f21ef70bf9ea5df8227e5b2cb1-309757619-fuse: Blob is 309757619 bytes in size, while this backend is only capable of storing blobs of up to 238608384 bytes in size

I would CTRL-C out of the ./run.sh and then run it again, but it doesn't seem like my config changes are getting picked up.

@SinOverCos please run docker-compose down and docker-compose up --force-recreate to make sure your containers do restart.

Hey @moroten, I did that but still saw the same error.

I applied your changes in #89 and that got me past the 227MB problem, so I guess it's actually reading one of the other config files?

Thank you for that PR!

Hi @moroten, unsure if this counts under the same issue, i'm also seeing:

java.io.IOException: com.google.devtools.build.lib.remote.ExecutionStatusException: INVALID_ARGUMENT: I/O error while running command: Failed to truncate file to length 1468467806: File size quota reached
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:235)
        at com.google.devtools.build.lib.remote.RemoteExecutionService.executeRemotely(RemoteExecutionService.java:1456)
        at com.google.devtools.build.lib.remote.RemoteSpawnRunner.lambda$exec$2(RemoteSpawnRunner.java:269)
        at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:244)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:125)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:114)
        at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:244)
        at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:301)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:152)
        at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:112)
        at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:47)
        at com.google.devtools.build.lib.exec.SpawnStrategyResolver.beginExecution(SpawnStrategyResolver.java:64)
        at com.google.devtools.build.lib.analysis.actions.SpawnAction.beginExecution(SpawnAction.java:352)
        at com.google.devtools.build.lib.actions.Action.execute(Action.java:133)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:957)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1124)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1082)
        at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:160)
        at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:93)
        at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:516)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:827)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:323)
        at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:161)
        at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:571)
        at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.base/java.lang.Thread.run(Unknown Source)
Caused by: com.google.devtools.build.lib.remote.ExecutionStatusException: INVALID_ARGUMENT: I/O error while running command: Failed to truncate file to length 1468467806: File size quota reached
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.handleStatus(GrpcRemoteExecutor.java:71)
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.getOperationResponse(GrpcRemoteExecutor.java:83)
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$2(GrpcRemoteExecutor.java:194)
        at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:244)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:125)
        at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:114)
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$3(GrpcRemoteExecutor.java:140)
        at com.google.devtools.build.lib.remote.util.Utils.refreshIfUnauthenticated(Utils.java:523)
        at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:138)
        ... 27 more

Action details (uncached result): http://localhost:7984/fuse/blobs/sha256/historical_execute_response/876adfbb6eaf329a759d29aa331ac9824c025320c89d9fb9a5219dcebdf5fdfa-1111/
ftruncate(): Input/output error

Is there a setting I can tweak to get past this? I've bumped my sizeBytes to 32GB and set oldBlocks, currentBlocks, newBlocks, and spareBlocks in all the files from #89 to 1, so I don't think this is a size limitation from there.

It is the worker configuration with maximumFilePoolSizeBytes: 1 * 1024 * 1024 * 1024. Your action is using more than 1 GiB disk space. Is 4 or 8 GiB enough for your example build? Don't forget to also update where it says concurrency * maximumFilePoolSizeBytes.

@SinOverCos please reopen this ticket or create another one if you think increasing the default worker file pool size should be done.

Thank you @moroten! I was able to get a successful build of an actual production target with these changes.