Netflix/rend

Histogram storage should be more dynamic

ScottMansfield opened this issue · 0 comments

With such a wide array of traffic expected for different caches, having one size for histogram data is not appropriate for all cases. The histogram backing slices ought to dynamically size up to a certain point, specifically 2^20 (1048576) entries. Each time the array needs to size up, it will double until the final size. The initial size will be 2^14 (16384) entries.

To double, the metrics code will wait until the next metric collection period. It will not double in an amortized fashion, because a request will need to wait for a large allocation in that case. When metrics are collected, a flag will have been set to double. During the critical section of collecting histogram metrics we also want to avoid allocation because requests can be waiting on the code to exit the critical section before returning to the clients. Thus, after flipping the buffers, we can check the flag and resize the slice after printing metrics.

Expanding in this manner will end up causing each buffer to independently increase in size, so after an increase in traffic it may take a couple of minutes to adapt to the new volume. A single setting will be present for each histogram, but given the second buffer was sent into service without necessarily being large enough, effects of resizing will be delayed by one metric collection period. For Netflix, this is 1 minute which is an acceptable delay.

From there, if the rate of traffic continues to overflow the array, the observations can be sampled first at 50%, and then 25%. That would be the last stop. This will give sampled histogram observations up to 2^22 (4194304) requests per metric collection period. If the metric collection period is a minute, this allows for 69,905 requests per second per histogram. Beyond that, the histogram data will simply overflow and overwrite the earliest data in that metric collection period.

The changes in size will be unidirectional for this feature. Since this is a reactive change, shrinking as well as expanding would break down in an environment with bursty traffic. Instead of sizing down, the high water mark will be used to prevent losing data as much as possible.

The size expansion piece is justified because there are several kinds of requests that are used very rarely in some apps and not rarely in others. It's possible that the histograms could be configurable, however it would make the most sense for histograms to be able to size to the application requirements dynamically. This relieves configuration and communication complexity needed to run a cluster. The less we do manually, the easier our lives are.

At 2^20 entries, the space used for a single histogram is 2^20 entries * 2^3 bytes per entry * 2 buffers per histogram = 2^23 bytes, or about megabytes of RAM. This is a maximum size. The histograms will not all be this large.