indeedeng/lsmtree

Questions

ccleve opened this issue · 5 comments

Have some questions, not sure where else to ask. We're looking at this project for use in one of our systems.

  1. Is the lsmtree project maintained or used? Hasn't been updated in a year.
  2. Why did you choose to develop it, instead of using an existing LSM implementation like Rocksdb? What does lsmtree do better?
  3. Why use native mmap code instead of the built-in java.nio.MappedByteBuffer?

Thanks for open-sourcing this project, by the way. I've poked around a bit and the code seems to be clean and well-designed.

  1. It is pretty heavily used at Indeed. We use this LSM Tree implementation in a lot of our critical backend systems. I did a talk on how this LSM Tree works and how we use it for serving job data in our search results that you can see here: http://engineering.indeedblog.com/blog/2013/03/indeedeng-from-1-to-1-billion-video/ There are a few changes in our internal repository that haven't been merged back here but they are pretty minor. This project has been stable for years.

  2. I originally wrote this LSM tree implementation in 2011 before RocksDB or LevelDB or any of the other open source LSM tree implementations existed. The main advantage of this LSM tree implementation at this point is that it interfaces nicely with java so you can use java types as keys and values instead of having to use byte[] for everything. The interface is really similar to the java Map interface. You can also provide your own java comparator for your keys which you can't really do with any of the C or C++ LSM tree implementations, they all make you sort in lexicographic order on the bytes. The performance should be on par with LevelDB.

  3. There are 3 problems I have with built in java.nio.MappedByteBuffer:

  • ByteBuffers are limited to 2^31 bytes
  • There is no method to put or get a sequence of bytes at an absolute position, you have to seek the ByteBuffer and then call get or put which is not thread safe
  • You can't unmap java.nio.MappedByteBuffer so the files can stay around long after they've been deleted until the GC finally decides to clean them up. If there are enough MappedByteBuffers being created sometimes you can run into vm.max_map_count before the GC has decided to clean them up and there isn't much you can do about it.
    These things can be worked around but in the interest of code clarity I found it better to just write my own memory mapped file abstraction.

Thank you. I watched the video, it's quite good.

I'll try to do an implementation in our system, and benchmark it against
Rocksdb.

The one downside to using native libraries like util-mmap or the Snappy
library is that they're not really cross-platform. Our system wouldn't be
able to run on Windows or various kinds of IBM hardware. That's not fatal,
but pure Java really is better.

There is a native java implementation of Snappy out there.

As for the ByteBuffer problems,

  1. Would it be possible to write a wrapper around a bunch of 2 Gb
    MappedByteBuffers?
  2. When I ran into this problem, the answer I got was that you should
    create a slice() anytime you wanted to do thread-safe random access, one
    slice per thread. It works, and it's really cheap to create a slice.
  3. Unmapping is a problem. So far as I can tell, this is the current
    thinking on the problem:

http://bugs.java.com/view_bug.do?bug_id=4724038

In my opinion, this security issue is hardly "insurmountable". We're
talking about two different threads in the same application. They can
overwrite each other's files now.

But for the purpose of the lsmtree, is it really ever necessary ever to
unmap?

On Thu, Jul 14, 2016 at 1:56 PM, jeffplaisance notifications@github.com
wrote:

  1. It is pretty heavily used at Indeed. We use this LSM Tree
    implementation in a lot of our critical backend systems. I did a talk on
    how this LSM Tree works and how we use it for serving job data in our
    search results that you can see here:
    http://engineering.indeedblog.com/blog/2013/03/indeedeng-from-1-to-1-billion-video/
    There are a few changes in our internal repository that haven't been merged
    back here but they are pretty minor. This project has been stable for years.

  2. I originally wrote this LSM tree implementation in 2011 before RocksDB
    or LevelDB or any of the other open source LSM tree implementations
    existed. The main advantage of this LSM tree implementation at this point
    is that it interfaces nicely with java so you can use java types as keys
    and values instead of having to use byte[] for everything. The interface is
    really similar to the java Map interface. You can also provide your own
    java comparator for your keys which you can't really do with any of the C
    or C++ LSM tree implementations, they all make you sort in lexicographic
    order on the bytes. The performance should be on par with LevelDB.

  3. There are 3 problems I have with built in java.nio.MappedByteBuffer:

  • ByteBuffers are limited to 2^31 bytes
  • There is no method to put or get a sequence of bytes at an absolute
    position, you have to seek the ByteBuffer and then call get or put which is
    not thread safe
  • You can't unmap java.nio.MappedByteBuffer so the files can stay
    around long after they've been deleted until the GC finally decides to
    clean them up. If there are enough MappedByteBuffers being created
    sometimes you can run into vm.max_map_count before the GC has decided to
    clean them up and there isn't much you can do about it. These things can be
    worked around but in the interest of code clarity I found it better to just
    write my own memory mapped file abstraction.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#5 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABEzqsSaas9QrbC1bg9rM7KhEWni8qO2ks5qVoZqgaJpZM4JMppN
.

There's a solution here to the unmap problem:
http://stackoverflow.com/questions/2972986/how-to-unmap-a-file-from-memory-mapped-using-filechannel-in-java

On Fri, Jul 15, 2016 at 6:39 PM, Chris Cleveland <ccleveland@dieselpoint.com

wrote:

Thank you. I watched the video, it's quite good.

I'll try to do an implementation in our system, and benchmark it against
Rocksdb.

The one downside to using native libraries like util-mmap or the Snappy
library is that they're not really cross-platform. Our system wouldn't be
able to run on Windows or various kinds of IBM hardware. That's not fatal,
but pure Java really is better.

There is a native java implementation of Snappy out there.

As for the ByteBuffer problems,

  1. Would it be possible to write a wrapper around a bunch of 2 Gb
    MappedByteBuffers?
  2. When I ran into this problem, the answer I got was that you should
    create a slice() anytime you wanted to do thread-safe random access, one
    slice per thread. It works, and it's really cheap to create a slice.
  3. Unmapping is a problem. So far as I can tell, this is the current
    thinking on the problem:

http://bugs.java.com/view_bug.do?bug_id=4724038

In my opinion, this security issue is hardly "insurmountable". We're
talking about two different threads in the same application. They can
overwrite each other's files now.

But for the purpose of the lsmtree, is it really ever necessary ever to
unmap?

On Thu, Jul 14, 2016 at 1:56 PM, jeffplaisance notifications@github.com
wrote:

  1. It is pretty heavily used at Indeed. We use this LSM Tree
    implementation in a lot of our critical backend systems. I did a talk on
    how this LSM Tree works and how we use it for serving job data in our
    search results that you can see here:
    http://engineering.indeedblog.com/blog/2013/03/indeedeng-from-1-to-1-billion-video/
    There are a few changes in our internal repository that haven't been merged
    back here but they are pretty minor. This project has been stable for years.

  2. I originally wrote this LSM tree implementation in 2011 before RocksDB
    or LevelDB or any of the other open source LSM tree implementations
    existed. The main advantage of this LSM tree implementation at this point
    is that it interfaces nicely with java so you can use java types as keys
    and values instead of having to use byte[] for everything. The interface is
    really similar to the java Map interface. You can also provide your own
    java comparator for your keys which you can't really do with any of the C
    or C++ LSM tree implementations, they all make you sort in lexicographic
    order on the bytes. The performance should be on par with LevelDB.

  3. There are 3 problems I have with built in java.nio.MappedByteBuffer:

  • ByteBuffers are limited to 2^31 bytes
  • There is no method to put or get a sequence of bytes at an absolute
    position, you have to seek the ByteBuffer and then call get or put which is
    not thread safe
  • You can't unmap java.nio.MappedByteBuffer so the files can stay
    around long after they've been deleted until the GC finally decides to
    clean them up. If there are enough MappedByteBuffers being created
    sometimes you can run into vm.max_map_count before the GC has decided to
    clean them up and there isn't much you can do about it. These things can be
    worked around but in the interest of code clarity I found it better to just
    write my own memory mapped file abstraction.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#5 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/ABEzqsSaas9QrbC1bg9rM7KhEWni8qO2ks5qVoZqgaJpZM4JMppN
.

It is necessary for the lsmtree to unmap files. As you add data, it creates and deletes files and the deleted ones need to be unmapped.

If you want to use java's mmap you'd just have to implement the Memory interface using a bunch of MappedByteBuffers and then add a system property in MMapBuffer to make it choose between that and DirectMemory. I'd recommend making them 1 GB because the maximum length of a ByteBuffer is actually 2^31-1 bytes (one byte short of 2 GB) and doing all the math to figure out what buffer you should be accessing is going to be easier, cheaper, and more clear if they're aligned to powers of 2.

You can use a segment tree for searching the memory mapped block correctly using a absolute address.I think it is not so difficult make it thread safe .