JackBister/logsuck

Compress the EventRaws table

JackBister opened this issue · 5 comments

SQLite has support for compressing FTS tables: https://sqlite.org/fts3.html#the_compress_and_uncompress_options

This should be supported by Logsuck to prevent the database file size from growing out of control. Depending on how much it affects insert performance this might need to be configurable so that users with high throughput needs can disable it.

Hi JackBister,
Can I work on this issue? I would like to contribute to this project and I found this issue to be a good start.
Tnx

@aminsalami Hey man, go ahead! Let me know if there is anything you need help with.

I haven't dug too deeply into this except for reading the link in the issue, but my thinking is that the compress/uncompress functions may need to be implemented in C to avoid losing too much performance from crossing back and forth between C and Go, but it might be worth trying out Go implementations and measuring what the performance is like. I think the tricky bit is going to be making sure that this actually has a benefit in terms of storage without costing too much read/write performance.

There is some more useful info about SQLite extensions here in case you need it:
https://www.hwaci.com/sw/sqlite/loadext.html
https://godoc.org/github.com/mattn/go-sqlite3#hdr-SQLite3_Extension

@JackBister Hey. Sorry It took so long. I have free times only on weekends.
Anyway, I tried to add compression to 'EventRaws' table using this piece of code. To measure the effect of compress/uncompress functions, I wrote a test which uses two tables: One with compression enabled and one without compression.
The result is interesting. The 'EventRaws' table with compression takes more space!
My first guess was that compression affects better on long texts. So I tried to use JSON as raw data for events. This time result was much better and I could see database size has been reduced.

So, Is this normal or maybe I missed something here? Should I push my codes?
Meanwhile, This is almost my first time contributing to a project on Github. Please correct me if I'm doing wrong :)
Thanks.

@aminsalami Hi, sorry for the late response. It's a very interesting result! If you push the code to your fork I could check it out and test it too. I guess it is as you say that longer lines compress better, so maybe it's not worth having compression depending on how long you expect your log messages to be. It might also be interesting to try different compression algorithms than the one on the SQLite page in case they work better for short lines.

Check out my latest commit. Thanks in advance.
Not that by enabling compression flag, the "EventRaws" table in "logsuck.db" can't be read using common tools!