What is lod in `carts_lod`?

Question

What is lod in `carts_lod`?

Opened this issue 2 years ago · 9 comments

Thank you for your work, @azamikram! Please let me know what is lod in the sock-shop data. I can't find anything about this in the paper and in your code

Answer 1 · 2023-05-14T16:16:04.000Z

Is it container_network_receive_packets_total as workload ?
or container_processes as CPU load?

Answer 2 · 2023-05-26T14:44:26.000Z

The lod is the number of requests received between two time intervals. We used the following Prometheus query to extract it sum(rate(request_duration_seconds_count [{DURATION}])) by (name) where DURATION is the length of the interval.

Answer 3 · 2023-05-26T15:06:33.000Z

@azamikram Thank you so much for your reply! I am wondering if you can give me the full Prometheus queries that you use to extract sock shop data, including the duration and other configurations?

Answer 4 · 2023-05-26T17:40:41.000Z

You can find that script in sock-shop-data now.

Answer 5 · 2023-05-26T20:36:58.000Z

Thank you so much for your support @azamikram! I see that you don't put the err metrics into the sock-shop data, may I know why?

Answer 6 · 2023-05-26T21:42:28.000Z

I cannot recall why I decided not to collect data for err. One thing that comes to mind is that err was only available for two services (front-end and catalogue) but I'm not sure if that was the reason.

Answer 7 · 2023-05-27T16:45:38.000Z

Thank you so much for your answer @azamikram , it really helps me!

I plotted your data and have another question: In this figure, the fault is "payment-mem", the memory usage of the payment container is increasing, I can understand that. But
Q. Why did the memory usage of other containers decrease or increase so suddenly?

I can't figure it out why, please help me 😄

Answer 8 · 2023-05-29T21:00:09.000Z

Failure propagation chain! Change in one service affects how other parts of the system behave.

Answer 9 · 2023-05-29T22:10:31.000Z

Thank you so much for your answer @azamikram! Could you please do another favour by public the stress-ng command that you used to inject the fault into the sock-shop? I also use stress-ng but I'm incapable of reproducing the failure propagation chain like you said 😄