bh1xuw/rust-rocks

Issue with iterator and lifetimes

winstonewert opened this issue · 6 comments

Consider the implementation of next() for Iterator

fn next(&mut self) -> Option<Self::Item> {
        if self.is_valid() {
            let k = self.key();
            let v = self.value();
            self.next();
            Some((k, v))
        } else {
            None
        }
    }

The self.next() line invalidates the returned references from self.key() and self.value(). Thus this iterator is always returning bad references.

But why didn't Rust catch this?

pub fn key(&self) -> &'a [u8] {

key (and value) claims to return a reference with lifetime 'a. But this isn't really accurate, as it should only be valid until the next &mut call on self (probably on .next()). Rust doesn't catch the problem because of the unsafe code used to implement the function (obviously unavoidable.)

Haven't figured out how to express the lifetime restrictions of key & value in Rust.

Is ReadOptions::pin_data solves?

  // Keep the blocks loaded by the iterator pinned in memory as long as the
  // iterator is not deleted, If used when reading from tables created with
  // BlockBasedTableOptions::use_delta_encoding = false,
  // Iterator's property "rocksdb.iterator.is-key-pinned" is guaranteed to
  // return 1.
  // Default: false
  bool pin_data;

It seems to me that key() and value() ought to be simply like:

pub fn key(&self) -> &[u8] {

The references remain valid until the iterator is modified by calling .next()

pin_data looks like it would solve the problem for keys, but at the cost of keeping all the keys in memory which wouldn't work in my case since I'm iterating over a large table and really don't want to keep all those keys in memory. And it wouldn't solve for value() in any case.

I think the problem here is that iterator doesn't allow you to return references that only valid until .next() is called.

The next() should be

fn next<'k, 'v>(&'a mut self) -> Option<(&'k [u8], &'v [u8])>

When pin_data enabled, 'k is the same as 'a.
Or else 'k and 'v live just before next next() call.

Haven't figure out how to express the lifetime restriction in Rust. 🤣

Workaround:

In for _ in it {} blocks: nothing changed.

While collecting for later use:

// `it` here, for later use, you must hold the original `mut it` somewhere.
it.map(|(key, val)| (key.to_vec(), val.to_vec())).collect::<Vec<_>>();

In my case, that would load the whole multi-gb datastore into memory, not a good plan.

A better workaround is to use a while loop with is_valid()

I will add a link to this issue in README.
Many thanks.