TTL feature
Closed this issue · 18 comments
Hi,
We use rocksdb in production with TTL layer for time-series data for autoexpire,
but now need's more features, like Mongo has (cluster, indexes, replica, etc).
Now we observing for different db solutions.
After studing mongo-rocks code, I found that TTL feature released in base (Mongo) and not use native rocksdb TTL.
my question's are:
- whether possible implement an native rocksdb ttl mapping to mongo?
- does an high-level TTL (in Mongo) implementation impact performance more than native (in Rocksdb) or doesn't worry?
Aleksander
Hi @malexzx, thanks for your questions.
- Not right now, unfortunately. There are two issues which make this problem hard. The first one is that it'd be hard to keep the index and the collection consistent. When we phase out an old key in the collection, we also want to somehow remove it from the index. We could just keep the key in index and ignore it when scanning, but this breaks the storage engine contract. The second issue is that currently MongoDB handles TTL above the storage engine layer and doesn't expose TTL information to the storage engine. I hope one day we get a new storage engine API that will let us optimize the TTL process.
- high-level TTL feature in Mongo is theoretically doing more work than native RocksDB implementation. This is because Mongo keeps a different thread that keeps scanning the database and deletes any old documents it encounters. RocksDB filters the old keys by using compaction filter, which adds almost no overhead -- during compaction (that would happen anyway) it uses the callback and removes the keys from the output set. However, this is all theoretical and it might be that the overhead is barely noticeable in your use-case. I would be very interested in seeing the results of this.
What do you think of this solution: if you want to keep last N days of time series data, create N MongoDB collections. As you create a new collection for the new day, drop the oldest collection. This will work with the same overhead as RocksDB TTL, since we use compaction filter for dropping old collections.
Hi @igorcanadi, great thanks for your answers!
In our implementation time-series data is expired in days, weeks, months and years intervals.
We use for that different collumn family in rocksdb. TS data in database is about from 250Gb to 4Tb.
Do you think that the existing mongo mechanism be able to cope with such volume?
Does the mongo sharding simplify this task?
I will try to check your solution, it's close to old our implementation (but by the old days we use leveldb databases for that. And stuff for serving that task is not lite that would be been and had many issues). I see, that I should for that do enumerating a collections by name in time-based order and then selecting just necesary collection, and then iterate within it.
But if the index contract don't care (for that collection), whether it possible (and how would you do it) to implement a special storage for selected collections (for example in different column family), so it autoexpires items automagicaly without mongo TTL?
Do you think that the existing mongo mechanism be able to cope with such volume?
I honestly don't have experience with MongoDB's TTL. However, if you have some CPU and read IOPS to spare on the machine, I think it should be possible.
Does the mongo sharding simplify this task?
Also no experience with Mongo sharding unfortunately. How do you think it'd make it easier?
But if the index contract don't care (for that collection), whether it possible (and how would you do it) to implement a special storage for selected collections (for example in different column family), so it autoexpires items automagicaly without mongo TTL?
Yes, if you don't care about index contract it would definitely be possible to implement it on the RocksDB side. It would be easy to do it hackily: just change our compaction filter a bit to understand the TTL setting. I would encourage you to try implementing it if you have time and resources.
i try to implement decode values from rocksdb (in compaction filter), but now i understand so this approach unclear for me.
my question is how to distinguish between index and record and then get underlying bson, and then get ttl field (my custom json field, for example) and then return true from filter. How to do that simple?
Yeah, the compaction filter will be quite complex -- you need to refer back to the prefix -> ident
mapping to understand whether an key is an index or a record store key. Once you know which one it is, it should be fairly simple -- the value in a record store is a pure BSON, so you can just give that binary string directly to BSON constructor I think.
I was try to investigating it, and finally it worked on v3.2 and master branch. See it in
v3.2: malexzx@bb823e7
master: malexzx@f31ccab
One question about collection size: If I run js command
print(db.collection.count()) - it always return same value (after compaction, and documents gone).
print(db.collection.find({a: {$gt: 0} }).count()) - return correct number of items
For example for document format is {a: Number, _rttl: Number}.
More over, after completely expiring items and add new ones - count() return summary of previsions (expired and gone) and new items.
I think this will need some metainfo correcting?
db.collection.validate() has no effect.
2016-09-06T14:50:09.099+0700 I COMMAND [conn17] CMD: validate test.my
2016-09-06T14:50:09.106+0700 I INDEX [conn17] validating index test.my.$id
2016-09-06T14:50:09.109+0700 W STORAGE [conn17] collection-0--6564268557375541933: Existing record and data size counters (2000 records 96000 bytes) are inconsistent with full validation results (699 records 33552 bytes). Updating counters with new values.
2016-09-06T14:50:31.702+0700 I COMMAND [conn17] CMD: validate test.my
2016-09-06T14:50:31.709+0700 I INDEX [conn17] validating index test.my.$id
2016-09-06T14:50:31.715+0700 W STORAGE [conn17] collection-0--6564268557375541933: Existing record and data size counters (2000 records 96000 bytes) are inconsistent with full validation results (699 records 33552 bytes). Updating counters with new values.
I think, to resolve inconsistence issue, I must correct numRecords and other stuff in objects.
It is safe to do this in destructor of the CompactionFilter (under corresponding mutex) or this work need additional background thread?
db.collection.validate() has no effect.
Yeah, that's a known bug. We do update the counts on the validation() pass, but mongo treats it as a read transaction and does not commit the writes :(
It is safe to do this in destructor of the CompactionFilter (under corresponding mutex) or this work need additional background thread?
Yes, I think it should be safe, although it will likely complicate the solution quite a bit. :/ You'll need to pass in the RecordStore objects into the compaction filter.
I was try to investigating it, and finally it worked on v3.2 and master branch. See it in
v3.2: malexzx/mongo-rocks@bb823e7
master: malexzx/mongo-rocks@f31ccab
Checked out the implementation, good work, this looks pretty reasonable and not too complicated. :)
This is where we update the numbers in call to validate(): https://github.com/mongodb-partners/mongo-rocks/blob/master/src/rocks_record_store.cpp#L745
Thank you for answer!
While trying to correct numbers in RecordStore, finally I found that is not persistent over restart database. And need to save some state (it maybe do via local transaction). But indexes leaves to be still inconsistent. How queries be react on this approach. I will try to test, and thinking how to delete indexes.
Another idea: using mongo TTL feature, but intercept the high level query and collect index and RS prefixes. Then pass it in the CompactionFilter for compaction.
finally write changes direct to db malexzx@6c0bbb0 solve count problem. Still indexes inconsistent.
Nice job on solving the count issue :)
Hello @igorcanadi !
Your solution in first answer now give me an idea, how to drop whole collection on TTL specified within that collection (how it may be specified, I will think about). Reason to do that, is not count collections, not drop in code, just create for new day (for example), and other work will be delivered to RocksDb. Over-collections iterator maybe released in javascript code.
Hi Aleksander ,
I am considering using your enhancement to avoid of using mongo TTL for auto-deletion and instead use the native rockdDB implementation, can you advice how you finally implement the TTL using the rockdDB level? do i need to define ttl index on mongo , did the column name must be _rttl ? can you gave some guideline
Thanks
Hi @elirevach12 ,
My enhancement is still experiment and now my project using the collection based TTL management.
To do this we just drop old collection(s) few times in the day.
If you find a way to do with my enhancement (implement index synchronisation management) - that yes - you need for this feature the only one condition - the column name must be _rttl.
Hope this helps.
Time series data, only few days for raw data, massive input, etc.