azamikram/rcd

What is lod in `carts_lod`?

Opened this issue · 9 comments

Thank you for your work, @azamikram! Please let me know what is lod in the sock-shop data. I can't find anything about this in the paper and in your code

image

Is it container_network_receive_packets_total as workload ?
or container_processes as CPU load?

The lod is the number of requests received between two time intervals. We used the following Prometheus query to extract it sum(rate(request_duration_seconds_count [{DURATION}])) by (name) where DURATION is the length of the interval.

@azamikram Thank you so much for your reply! I am wondering if you can give me the full Prometheus queries that you use to extract sock shop data, including the duration and other configurations?

You can find that script in sock-shop-data now.

Thank you so much for your support @azamikram! I see that you don't put the err metrics into the sock-shop data, may I know why?

I cannot recall why I decided not to collect data for err. One thing that comes to mind is that err was only available for two services (front-end and catalogue) but I'm not sure if that was the reason.

Thank you so much for your answer @azamikram , it really helps me!

I plotted your data and have another question: In this figure, the fault is "payment-mem", the memory usage of the payment container is increasing, I can understand that. But
Q. Why did the memory usage of other containers decrease or increase so suddenly?

I can't figure it out why, please help me 😄

plot

Failure propagation chain! Change in one service affects how other parts of the system behave.

Thank you so much for your answer @azamikram! Could you please do another favour by public the stress-ng command that you used to inject the fault into the sock-shop? I also use stress-ng but I'm incapable of reproducing the failure propagation chain like you said 😄