New post: Detailing the v1.15 DB engine
joelhans opened this issue · 5 comments
Per Costa's message on Slack:
Check this: https://twitter.com/K900TweetsStuff/status/1148565217925828608
netdata kind of works, but it doesn't do long term storage. It can dump metrics into Grafana or whatever for that, but that means you end up with Grafana anyway...
People do not know we have a new db engine. @joel we should celebrate it somehow. Add a blog article?
Here is the brief I'm working with currently for this post. Please offer feedback and information wherever you think it makes sense. I'm having a bit of trouble wrapping my brain around the more technical aspects of the DB engine, so I'll need some help.
@mfundul — You seem like the main dev behind the DB engine, so your input will be invaluable here.
Topic: Detailing the v1.15 DB engine
Goal: Inform users about the primary benefits of the DB engine released in v.1.15.0. This post should be beginner-friendly, read easier than the relevant documentation, and tease a little bit of what's to come with further improvements to the DB engine (default in the future?)
Outline:
- A history of the development of the new DB engine
- Why was it necessary? "Historically, Netdata has required a lot of memory for long-term metrics storage."
- Permits longer-term storage of compressed data
- How the new DB engine works
- Use some of the information from Database engine.
- I could use some help in determining which information is most important.
- How will the new DB engine help you?
- Can store a dataset that's much larger than the available memory
- Reduce reliance on other backends like graphite or prometheus
- The DB engine's future
- DB engine already received an update in v1.16 that allows it to use less memory while being more robust
- With refinement it will become the default memory mode, and that will make Netdata even more powerful without requiring any further configuration
- Point users toward the documentation to help them test and enable the new DB engine
Specific questions that I have:
- What information about the DB engine's operation is most important?
- If one uses
dbengine disk space = 256
(as referenced in the docs), what duration of metrics will that store given X metrics collected every second?
Other relevant issues/information:
netdata/netdata#5879
netdata/netdata#5303
About DB engine's future:
- support for changing data collection frequency without losing all the metric data gathered up to this point. (Currently DB engine may lose most of its accuracy, other memory modes simply discard all history)
- support queries for ephemeral or obsolete metrics. (Currently netdata can only query about metrics that are actively being collected)
- progressive thinning of metrics by aggregating the oldest metrics and reducing their frequency (transformation), so as to occupy less space and cover larger time periods (tiering).
- support tags and labels (still a bit fuzzy until we discuss those)
- support storing data blobs as well (same as above)
About disk space, with a typical compression ratio of about 80% we are at about 25MiB/metric/year. Randomized metrics which are uncompressible are about 135MiB/metric/year, or 4500 bytes/metric/1000 seconds (all that is for frequency of 1 second).
A large server may collect 2000 metrics. In such a large server with a compression ratio of 80% you will be able to hold about 2 days of metrics with the default 256MiB of disk space.
The memory requirements are non-obvious and non-intuitive, so they are very important to be considered as they have significant impact.
@mfundul Thanks for the quick answers! I'm going to start drafting and will ping you if there's any follow-up questions.
Closed because this post has been live for a bit.