What is lod in `carts_lod`?
Opened this issue · 9 comments
Thank you for your work, @azamikram! Please let me know what is lod
in the sock-shop data. I can't find anything about this in the paper and in your code
Is it container_network_receive_packets_total
as workload ?
or container_processes
as CPU load?
The lod
is the number of requests received between two time intervals. We used the following Prometheus query to extract it sum(rate(request_duration_seconds_count [{DURATION}])) by (name)
where DURATION
is the length of the interval.
@azamikram Thank you so much for your reply! I am wondering if you can give me the full Prometheus queries that you use to extract sock shop data, including the duration and other configurations?
You can find that script in sock-shop-data now.
Thank you so much for your support @azamikram! I see that you don't put the err
metrics into the sock-shop data, may I know why?
I cannot recall why I decided not to collect data for err
. One thing that comes to mind is that err
was only available for two services (front-end and catalogue) but I'm not sure if that was the reason.
Thank you so much for your answer @azamikram , it really helps me!
I plotted your data and have another question: In this figure, the fault is "payment-mem", the memory usage of the payment container is increasing, I can understand that. But
Q. Why did the memory usage of other containers decrease or increase so suddenly?
I can't figure it out why, please help me 😄
Failure propagation chain! Change in one service affects how other parts of the system behave.
Thank you so much for your answer @azamikram! Could you please do another favour by public the stress-ng
command that you used to inject the fault into the sock-shop? I also use stress-ng
but I'm incapable of reproducing the failure propagation chain like you said 😄