ericniebler/range-v3

Working with iterators that store values and return them leads to dangling references in some ranges

benjamind2330 opened this issue · 3 comments

In this example i have made a simple iterator that represents a larger problem. Namely that the iterator is using the container to modify and output a member variable.

struct container {

    struct basic_iterator
    {
        using iterator_category = std::forward_iterator_tag;
        using difference_type   = std::ptrdiff_t;
        using value_type        = std::pair<std::string, int>;
        using pointer           = value_type*;  // or also value_type*
        using reference         = value_type&;  // or also value_type&

        basic_iterator() = default;

        basic_iterator(const std::vector<int>& data, size_t index)
            : m_data(&data)
            , m_index(index)
        { 

            if (m_index < m_data->size()) {
                v = std::pair{std::to_string(m_index), m_data->at(m_index)};
            } else {
                v = std::nullopt;
            }

        }

        const value_type& operator*() const { return *v; }
        const value_type* operator->() const { return v.operator->(); }

        basic_iterator& operator++()
        {
            ++m_index;
            if (m_index < m_data->size()) {
                v = std::pair{std::to_string(m_index), m_data->at(m_index)};
            } else {
                v = std::nullopt;
            }

            return *this;
        }

        basic_iterator operator++(int)
        {
            iterator tmp = *this;
            ++(*this);
            return tmp;
        }

        friend bool operator==(const basic_iterator& a, const basic_iterator& b) { return a.m_index == b.m_index; };
        friend bool operator!=(const basic_iterator& a, const basic_iterator& b) { return a.m_index != b.m_index; };

    private:
        const std::vector<int>*         m_data = nullptr;
        std::size_t               m_index = 0;
        std::optional<value_type> v;
    };
    
    using iterator = basic_iterator;
    iterator       begin() const { return {m_data, 0}; }
    iterator       end() const { return {m_data, m_data.size()}; };
    
    std::vector<int> m_data;
};

int main() {

    container c;
    c.m_data = std::vector<int>{1, 2, 3, 4, 5};  

   // THIS WORKS
    for (auto i : c |   std::ranges::views::keys) {
        std::cout << i << " ";
    }
    
   // THIS ACCESSES INVALID MEMORY
    for (auto i : c | ranges::views::keys) {
        std::cout << i << " ";
    }
}

Here is a live example
https://godbolt.org/z/8dEh7a8cK

The issue appears to be that this function:

        template<typename... Its>
        auto CPP_auto_fun(operator())(Its... its)
        (
            return invoke(fn_, *its...)
        )

Takes arguments by value leading to a copy of the iterator then a dangling reference to its internals.

This does not happen in std::ranges::views.

I think this might be a bug, should it be:

       template<typename... Its>
        auto CPP_auto_fun(operator())(Its&&... its)
        (
            return invoke(fn_, *std::forward<Its>(its)...)
        )

?

range-v3 doesn't support so-called "stashing" iterators. neither do the c++20 ranges concepts. if it works in your stdlib that's cool, but don't rely on it.

@ericniebler This is genuinly excellent to know. We have been relying on it, but I will make efforts to move away from this. I'm interested, do you happen to know if this is specified somewhere? Our iterator meets the requirements for a forward iterator I thought, so I am surprised that it is not supported.

Stashing iterators are perfectly good input iterators, but they can't meet the multipass requirements for forward-and-stronger iterator types.

The "old" iterator requirements specify that *i and *j for forward iterators i and j must refer to the same object if and only if i == j is true ([forward.iterators]/6). This obviously can't hold for a stashing iterator since *i and *auto{i} refer to different objects.

For the C++20 forward_iterator concept we complicated things by relaxing the requirement that decltype(*i) is always a reference type, so *i doesn't always refer to an object. We have the more general requirement that "Pointers and references obtained from a forward iterator into a range [i, s) shall remain valid while [i, s) continues to denote a range."([iterator.concept.forward]/3](https://eel.is/c++draft/iterator.concept.forward#3)). That means that things like:

if (std::distance(i, j) >= 2) {
  auto&& ref = *std::next(i);
  do_something(ref);
}

have to work, which won't if ref is bound to a subobject of the iterator returned from std::next that is destroyed at the semicolon before the do_something call.