/ucset

If only std::set was a DBMS: collection of templated ACID in-memory exception-free thread-safe and concurrent containers in a header-only library

Primary LanguageC++

Unexceptionally Consistent Set

Imagine In-Memory Templated Containers
Being as Consistent as Databases


Discord     LinkedIn     Twitter     Blog     GitHub


UCSet library provides std::set-like class templates for C++, where every operation is noexcept, and no update can leave the container in a partial state.

There are 3 containers to choose from:

All of them:

  • are noexcept top to bottom!
  • are templated, to be used with any noexcept-movable and default-constructible types.
  • can be wrapped into locked_gt, to make them thread-safe.
  • can be wrapped into partitioned_gt, to make them concurrent.

If you want your exceptions and classical interfaces back, you can also wrap any container into crazy_gt.

Installation

The entire library is header-only and requires C++17. You can copy-paste it, but it is not 2022 anymore. We suggest using CMake:

include(FetchContent)
FetchContent_Declare(
    ucset
    GIT_REPOSITORY https://github.com/unum-cloud/ucset
    GIT_TAG main
    CONFIGURE_COMMAND "" # Nothing to configure, its that simple :)
    BUILD_COMMAND "" # No build needed, UCSet is header-only
)
FetchContent_MakeAvailable(ucset)
include_directories(${consistent_set_SOURCE_DIR})

Why we created this?

Hate for std::bad_alloc. If you consider "Out of Memory" an exception, you are always underutilizing your system. It happened way too many times that a program crashed when I was only getting to an exciting place. Especially with:

  • Neo4J and every other JVM-based project.
  • With big batch sizes beyond VRAM sizes of GPU when doing ML.

At Unum, we live in conditions where machines can easily have 1 TB of RAM per CPU socket, but it is still at least 100x less than the datasets we are trying to swallow.

UKV Landscape

So when we started working on UKV to build high-speed hardware-friendly databases, we needed something better than Standard Templates Library, with features uncommon to other libraries as well:

  • Accessing the allocator state by reference.
  • Reserving memory for tree nodes before inserting.
  • Explicitly traversing trees for random sampling.
  • Speed!

Now UCSet powers the in-memory backend of UKV.

Performance Tuning

Concurrent containers in the library are blocking. Their performance greatly depends on the "mutexes" you are using. So we allow different implementations: