akumuli/Akumuli

Blob or document storage capacity

rebootcode opened this issue · 3 comments

I have a requirement where I need to store images every second from multiple sources. Maybe like storing 10K-100K less than 1 MB size images.

Is Akumuli suitable for storing "blob or document" data storage of such size and frequency with performance mentioned on README.md file?

Fast range scans and joins, read speed doesn't depend on database cardinality.
Fast data ingestion:
5.4M writes/sec on DigitalOcean droplet with 8-cores 32GB of RAM (using only 6 cores)
4.6M writes/sec on DigitalOcean droplet with 8-cores 32GB of RAM (6 cores with enabled WAL)
16.1M writes/sec on 32-core Intel Xeon E5-2680 v2 (c3.8xlarge EC2 instance).

Even though, In documentation here (https://docs.akumuli.org/writing-data#bulk-string), it does say - Bulk strings are used to represent arbitrary large (up to 1MB) binary objects.

Is it possible to have similar performance on such blob or document storage of this size?

All the images are later than removed after 3-6 hour or may be overwritten

Lazin commented

It's not possible. Bulk strings are part of RESP (which is a Redis protocol). Akumuli uses RESP and this documentation section describes RESP in general. Akumuli supports two data types: numeric and event (text).

@Lazin - I think there is confusion - "image, as base64 encoded, is text format image", so when I mention "blob or document storage", I have read "event (text)" storage.

We can store image as text after encoding it with base64 encoded text,
E.g as below, so Can we store this kind of text as per above frequence rate and maintain similar performance?

if you open below text in URL, it will render image, or you can use website like - https://codebeautify.org/base64-to-image-converter to convert base64 text to image.


Lazin commented

The size of the event is limited to 1KB. Another thing to consider is that the performance numbers was obtained on numeric time-series data. Not on large text blobs. For text blobs performance will be lower. Currently, you can expect to have around 1M writes/sec per CPU core for numeric time-series but only 80K writes/second for 100-byte long events.