abseil/abseil-cpp

[Bug]: hash_maps and hash_sets invoke undefined behavior on allocator exceptions.

Opened this issue · 0 comments

Describe the issue

When templating absl::flat_hash_map and absl::flat_hash_set on an allocator that throws exceptions, then the following typical workflow appears:

  • An insert/emplace operation triggers a resize.
  • the map increases the internal capacity() member and then calls initialize_slots()
  • The allocator throws, the exception flows through to the caller, but the actual slot_array() still has the old value (with the old capacity as the size), but the capacity() member is not reset to the old value.
  • Undefined behavior is inevitable, e.g. if after the exception the map/set is destroyed, then the destructor will call the destructor of out-of-bounds objects that are past the end of the slot_array(). In the best case this goes unnoticed, because the involved types are all trivially destructible and no actual code is invoked, in the worst case the program segfaults with hard to track errors, because complex objects are assumed at memory locations where no such objects have ever been created.

Steps to reproduce the problem

The following godbolt link interactively demonstrates the problem: click me

The full code is also listed in the follwoing:

#include <absl/container/flat_hash_map.h>
#include <absl/container/node_hash_map.h>
#include <unordered_map>
#include <iostream>

// Type aliases for hash maps with given key, value, and allocator types.
template <class K, class V, template<typename> typename Alloc,
          class HashFct = absl::container_internal::hash_default_hash<K>,
          class EqualElem = absl::container_internal::hash_default_eq<K>>

// With `flat_hash_map`, the assertion in the `S` destructor below fails
using HashMap = absl::flat_hash_map<K, V, HashFct, EqualElem, Alloc<std::pair<const K, V>>>;

// With `node_hash_map`, the destructor of the hash map itself segfaults.
//using HashMap = absl::node_hash_map<K, V, HashFct, EqualElem, Alloc<std::pair<const K, V>>>;

// With `std::unordered_map`, everything works as expected.
//using HashMap = std::unordered_map<K, V, HashFct, EqualElem, Alloc<std::pair<const K, V>>>;

int numAllocs = 0;
// A simple allocator, that increments `numAllocs` on each allocation and throws an exception once `numAllocs`
// has reached the value 2.
template <typename T>
struct Alloc {
    using Base = std::allocator<T>;
     using value_type = T;
  Alloc() = default;
  template <typename U> requires (!std::same_as<U, T>)
  Alloc(const Alloc<U>&) {}
    Base allocator;
    T* allocate( std::size_t n ) {
        if (++numAllocs >= 2) {
            throw 12;
        }
        std::cerr << "allocating, num Allocs is " << numAllocs << std::endl;

        return allocator.allocate(n);
    }
    void deallocate(T* p, size_t n) {
        allocator.deallocate(p, n);
    }
};

// A simple struct that visualizes its construction and destruction. Let's us visualize that the destructor is
// called for objects that never have been created.
struct S {
    static inline int i = 0;
    int v = ++i;
    bool initialized = false;
    S() {
        std::cerr << "initialize object " << v << '\n';
       initialized = true;
    }
    ~S() {
         std::cerr << "Destroying object " << v << "\n";
         // This check might accidentally pass (we are reading garbage values, anything is possible), 
         // but it fails deterministically on godbolt. A better way to track this is ASAN, which also would
         // complain here as soon as we touch one of the members of the `S` object (but this is not possible on Godbolt,
         // as you need to built absl with ASAN support as well).
        assert(initialized);}
};


int main() {
    try {
        HashMap<int, S, Alloc> m;
        int i = 0;
        while (true) {
            m[i++] = S{};
        }
    } catch (int err) {
        return err;
    }
}

What version of Abseil are you using?

Trunk via godbolt.org (December 7, 2023)
But shouldn't be relevant, this is a longstanding issue.

What operating system and version are you using?

I locally reproduced this on Ubuntu 20.04 LTS

What compiler and version are you using?

GCC 11.3, but the problem also happens on any other compiler I tried.

What build system are you using?

CMAKE, version doesn't matter.

Additional context

I would be willing to provide a fix, which IMHO should fully fix this problem:
Wrap the two calls in raw_hash_set.h that currently look like this

auto oldCapacity = common().capacity_;
common().capacity_ = <computeNewCapacity>;
initialize_slots();

in the following way:

auto oldCapacity = common().capacity_;
common().capacity_ = <computeNewCapacity>;
ABSL_INTERNAL_TRY {initialize_slots();}
ABSL_INTERNAL_CATCH_ANY { 
  common().capacity_ = oldCapacity;
  ABSL_INTERNAL_RETHROW;
}

I would just need some guidance on how best to test this (I read that there's some longstanding effort to test the exception safety of abseil, but I don't know how actively maintained that still is.

Best regards
Johannes