qdrant/qdrant

Handle Out-Of-Disk gracefully

Opened this issue ยท 8 comments

Is your feature request related to a problem? Please describe.

Currently, there are situations, when Qdrant service can crush if it is not enough disk space to perform update operation.
This is sub-optimal behavior, as it should still be possible to respond to the search requests in this case.

Describe the solution you'd like

Add improve the handling of the situation, where qdrant faces out-of-disk problem. Instead of crashing, it should answer 500 to the user and still be able to process incoming search requests.

Describe alternatives you've considered

Block requests if the disk usage is above some threshold. This would require configuration of the arbitrary threshold and overall less desirable.

Additional context

We prepared an automated test scenario - #4105
Solution of this issue should include a PR into test/low-disk-tests branch, which makes the OOD test pass.

/bounty $250

๐Ÿ’Ž $250 bounty โ€ข Qdrant

Steps to solve:

  1. Start working: Comment /attempt #4108 with your implementation plan
  2. Submit work: Create a pull request including /claim #4108 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Additional opportunities:

Thank you for contributing to qdrant/qdrant!

Add a bounty โ€ข Share on socials

Attempt Started (GMT+0) Solution
๐ŸŸข @Rutik7066 Apr 24, 2024, 3:27:47 PM WIP
๐ŸŸข @kemkemG0 May 3, 2024, 9:09:45 AM #4165

@generall I would like to solve this issue. Could you please assign me?

/attempt #4108

@generall

I think RocksDB employs a similar approach as you propsed, such as this if (free_space < reserved_disk_buffer_)...
https://github.com/facebook/rocksdb/blob/ed01babd07ab23788f563e78c234c01d247c09b9/file/sst_file_manager_impl.cc#L272-L291

https://github.com/facebook/rocksdb/blob/ed01babd07ab23788f563e78c234c01d247c09b9/db/db_impl/db_impl_open.cc#L2241-L2248

Additionally, it appears that RocksDB allows users to set the disk buffer size as cf.options.write_buffer_size.
https://github.com/facebook/rocksdb/blob/ed01babd07ab23788f563e78c234c01d247c09b9/db/db_impl/db_impl_open.cc#L2029-L2034

https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#flushing-options

I wonder if we can use wal_capacity_mb for this, but I'm not sure if write_buffer_size is equivalent to WAL size.

Either way, since RocksDB employs a strategy to maintain a maximum disk usage threshold, I think we should adopt a similar approach. I would love to proceed with this strategy. What do you think?

๐Ÿ’ก @kemkemG0 submitted a pull request that claims the bounty. You can visit your bounty board to reward.

๐ŸŽ‰๐ŸŽˆ @kemkemG0 has been awarded $250! ๐ŸŽˆ๐ŸŽŠ