docker-compose: build fails if blob size exceed 227 MiB
snizovtsev opened this issue · 11 comments
Hello,
I've started using buildbarn with docker-compose setup to facilitate hermetic builds in my bazel project. After few days of hassle-free experience I encountered with error:
Failed to store previous blob ...: Shard 0: Blob is 296016884 bytes in size, while this backend is only capable of storing blobs of up to 238608384 bytes in size
I see 2 issues here:
-
Its not obvious how to increase this limit. I tried to bump some constants in
config/storage.jsonnet
but it won't help. According tobb-storage
source it should be equal toint64(sectorSizeBytes)*blockSectorCount
but none of this constants present in example configuration. 238608384 is not even divisible by 2^20. -
I think 227 MiB too small to be a default blob size limit. Its common for C++ projects that binaries with debug symbols may weight in hundred of megabytes.
Thank you!
Increasing this setting from 8 GiB to 16 helps:
contentAddressableStorage: {
backend: {
'local': {
blocksOnBlockDevice: {
source: {
file: {
path: '/storage-cas/blocks',
sizeBytes: 16 * 1024 * 1024 * 1024,
},
},
},
},
},
Hi @snizovtsev, I'm seeing the same thing, did you figure out a solution to this?
@SinOverCos Yes, I've made the following changes to my config:
diff --git a/docker-compose/config/storage.jsonnet b/docker-compose/config/storage.jsonnet
index 3036251..6ca0175 100644
--- a/docker-compose/config/storage.jsonnet
+++ b/docker-compose/config/storage.jsonnet
@@ -25,7 +25,7 @@ local common = import 'common.libsonnet';
source: {
file: {
path: '/storage-cas/blocks',
- sizeBytes: 8 * 1024 * 1024 * 1024,
+ sizeBytes: 32 * 1024 * 1024 * 1024,
},
},
spareBlocks: 3,
@@ -52,7 +52,7 @@ local common = import 'common.libsonnet';
keyLocationMapMaximumGetAttempts: 8,
keyLocationMapMaximumPutAttempts: 32,
oldBlocks: 8,
- currentBlocks: 24,
+ currentBlocks: 48,
newBlocks: 1,
blocksOnBlockDevice: {
source: {
I'm keeping the issue open since the default limits looks too small for me.
Looking at storage.jsonnet
, one can read
oldBlocks: 8,
currentBlocks: 24,
newBlocks: 1,
blocksOnBlockDevice: {
source: {
file: {
path: '/storage-cas/blocks',
sizeBytes: 8 * 1024 * 1024 * 1024,
},
},
spareBlocks: 3,
}
The storage is split into 8+24+1+3=36
blocks. 8 GiB / 36 = 238609294 bytes
which rounded down to 4 kiB cluster size gives int(238609294 / 4096) * 4096 = 238608384
.
Basically, increase the sizeBytes
field if you would like a larger cache. It is possible to adjust the number of blocks, but I don't see any reason to do so unless you really want to do dig deep into the details.
Thanks @snizovtsev @moroten
After made these changes, I am still seeing the same error:
Caused by: com.google.devtools.build.lib.remote.ExecutionStatusException: INVALID_ARGUMENT: I/O error while running command: Failed to read from 1-6dc48734830f0cef535eebde2aa4dc047a6e00f21ef70bf9ea5df8227e5b2cb1-309757619-fuse at offset 309690368: Failed to replicate blob 1-6dc48734830f0cef535eebde2aa4dc047a6e00f21ef70bf9ea5df8227e5b2cb1-309757619-fuse: 1-6dc48734830f0cef535eebde2aa4dc047a6e00f21ef70bf9ea5df8227e5b2cb1-309757619-fuse: Blob is 309757619 bytes in size, while this backend is only capable of storing blobs of up to 238608384 bytes in size
I would CTRL-C out of the ./run.sh
and then run it again, but it doesn't seem like my config changes are getting picked up.
@SinOverCos please run docker-compose down
and docker-compose up --force-recreate
to make sure your containers do restart.
Hi @moroten, unsure if this counts under the same issue, i'm also seeing:
java.io.IOException: com.google.devtools.build.lib.remote.ExecutionStatusException: INVALID_ARGUMENT: I/O error while running command: Failed to truncate file to length 1468467806: File size quota reached
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:235)
at com.google.devtools.build.lib.remote.RemoteExecutionService.executeRemotely(RemoteExecutionService.java:1456)
at com.google.devtools.build.lib.remote.RemoteSpawnRunner.lambda$exec$2(RemoteSpawnRunner.java:269)
at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:244)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:125)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:114)
at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:244)
at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:301)
at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:152)
at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:112)
at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:47)
at com.google.devtools.build.lib.exec.SpawnStrategyResolver.beginExecution(SpawnStrategyResolver.java:64)
at com.google.devtools.build.lib.analysis.actions.SpawnAction.beginExecution(SpawnAction.java:352)
at com.google.devtools.build.lib.actions.Action.execute(Action.java:133)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:957)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1124)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1082)
at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:160)
at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:93)
at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:516)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:827)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:323)
at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:161)
at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:571)
at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: com.google.devtools.build.lib.remote.ExecutionStatusException: INVALID_ARGUMENT: I/O error while running command: Failed to truncate file to length 1468467806: File size quota reached
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.handleStatus(GrpcRemoteExecutor.java:71)
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.getOperationResponse(GrpcRemoteExecutor.java:83)
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$2(GrpcRemoteExecutor.java:194)
at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:244)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:125)
at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:114)
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$3(GrpcRemoteExecutor.java:140)
at com.google.devtools.build.lib.remote.util.Utils.refreshIfUnauthenticated(Utils.java:523)
at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:138)
... 27 more
Action details (uncached result): http://localhost:7984/fuse/blobs/sha256/historical_execute_response/876adfbb6eaf329a759d29aa331ac9824c025320c89d9fb9a5219dcebdf5fdfa-1111/
ftruncate(): Input/output error
Is there a setting I can tweak to get past this? I've bumped my sizeBytes
to 32GB and set oldBlocks
, currentBlocks
, newBlocks
, and spareBlocks
in all the files from #89 to 1
, so I don't think this is a size limitation from there.
It is the worker configuration with maximumFilePoolSizeBytes: 1 * 1024 * 1024 * 1024
. Your action is using more than 1 GiB disk space. Is 4 or 8 GiB enough for your example build? Don't forget to also update where it says concurrency * maximumFilePoolSizeBytes
.
@SinOverCos please reopen this ticket or create another one if you think increasing the default worker file pool size should be done.
Thank you @moroten! I was able to get a successful build of an actual production target with these changes.