openvstorage/alba

ASD fails when disk is full

Closed this issue · 5 comments

When a error occurs on a certain disk with No Space Left On Device we've observed on our test environments that the ASD just shuts down.

This could cause problems because you cannot read nor delete (to make some space) anymore from the disk.

A possible solution could be: introduce a small piece of disk space where the ASD has control over and where no data is stored. e.g. the disk can max. get to 95%.

by default, the ASD should no longer allow writes when it's Full which is
when it reaches capacity * (limit / 100.0). Default is limit = 99.
You can however still perform reads and deletes.
If it reaches ENOSPC, It's too late and it shuts down, as not even deletes are possible (as you need a bit of diskspace to update the database)

@toolslive what is the reason for not even allowing reads when it reaches ENOSPC.

ENOSPC is terminal. The ASD might not even be capable anymore to delete a value, as that requires the transaction to be logged first. We risk rocksdb corruption and other bad things. Btw,
in this situation it's probably not the best idea to automatically restart the service.

The bug is not that it dies when there's an ENOSPC. The bug is that we get there, while we limit ourselves to use less than the given capacity.

the example in #547 shows that rocksdb keeps a fat log of >70MiB. In this issue, the volume was 2.5GiB, and the reserved space is 25MiB. Either set the capacity to a lower value
(this was a setup with 2 ASDs on 1 FS, so the capacity was given in the configuration), or set the limit to something like 95.