ashvardanian/less_slow.cpp

Doubt about Return Value Optimization section

Opened this issue · 5 comments

Hi, I am not convinced by the code section about NRVO:

std::optional<std::string> make_heavy_object_mutable() {
    std::string x(1024, 'x');
    return x;
}

std::optional<std::string> make_heavy_object_immutable() {
    std::string const x(1024, 'x'); //! `const` is the only difference
    return x;
}

static void rvo_friendly(bm::State &state) {
    for (auto _ : state) bm::DoNotOptimize(make_heavy_object_mutable());
}

static void rvo_impossible(bm::State &state) {
    for (auto _ : state) bm::DoNotOptimize(make_heavy_object_immutable());
}

It states that the const prevents NRVO, but cv qualifications actually don't inhibit it (https://timsong-cpp.github.io/cppwp/n4659/class.copy#elision-1.1). I think this test isn't checking RVO at all, as the return type is actually std::optional, not std::string. What the const inhibits is moving the string inside the optional, and instead it forces the use of the copy constructor (that has to perform a memcpy).

The behavior would likely depend on the compiler version and flags like the -fno-elide-constructors. There are probably better examples, that would have a more consistent behavior. Let's think 🤔

I still don't think that it's testing what it says it's testing.
Godbolt example: https://gcc.godbolt.org/z/8Md39nnd5

Yes, I also don't like that part. I've now replaced the "optional string" with a heavy custom object with "sleep" calls in constructors, but still looking for a better set of examples for RVO.

I think that's a valuable example of its own. It's just not copy elision but move inhibition.

A NRVO inhibition is eg return std::move(x); (forces move instead, so anything with expensive move will do)

How about this?

struct heavy_t {
    std::uint64_t data[8];

    heavy_t() noexcept { std::iota(data, data + 8, 0); }

    heavy_t(heavy_t &&) { std::this_thread::sleep_for(std::chrono::milliseconds(1)); }
    heavy_t(heavy_t const &) { std::this_thread::sleep_for(std::chrono::milliseconds(2)); }
    heavy_t &operator=(heavy_t &&) {
        std::this_thread::sleep_for(std::chrono::milliseconds(1));
        return *this;
    }
    heavy_t &operator=(heavy_t const &) {
        std::this_thread::sleep_for(std::chrono::milliseconds(2));
        return *this;
    }
};

heavy_t make_heavy_object() { return heavy_t {}; }

heavy_t make_named_heavy_object() {
    heavy_t x;
    return x;
}

heavy_t make_conditional_heavy_object() {
    heavy_t x;
    heavy_t &x1 = x;
    heavy_t &x2 = x;
    static std::size_t counter = 0; //! Condition prevents RVO
    if (counter++ % 2 == 0) { return x1; }
    else { return x2; }
}

static void rvo_trivial(bm::State &state) {
    for (auto _ : state) bm::DoNotOptimize(make_heavy_object());
}

static void rvo_expected(bm::State &state) {
    for (auto _ : state) bm::DoNotOptimize(make_named_heavy_object());
}

static void rvo_banned(bm::State &state) {
    for (auto _ : state) bm::DoNotOptimize(make_conditional_heavy_object());
}

BENCHMARK(rvo_trivial);
BENCHMARK(rvo_expected);
BENCHMARK(rvo_banned);
-------------------------------------------------------
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
rvo_easy          0.634 ns        0.634 ns    851954378
rvo_expected      0.640 ns        0.640 ns   1115473320
rvo_banned      2060564 ns         6039 ns        10000