twitter-archive/ostrich

Documents in details the meaning of standard metrics

xasima opened this issue · 0 comments

Some words about percentile may be copied from
https://groups.google.com/forum/?fromgroups=#!topic/finaglers/UCyuoco0dxM

request_latency_ms is a breakdown of how long requests are taking. The stats you asked >about are the latency percentiles in milliseconds. 25% of requests took <= 1ms, 50% took ><= 2ms, 75% <= 8ms, 99.99% <= 95ms. p25 == "25th percentile", p50 == "50th >percentile", etc.

It's good to provide some explanations on finagle client (as probably most used example of metrics exposed via ostrich), finagle server, and common ostrich (JVM) metrics.

COMMON OSTRICH METRICS

counter:jvm_gc_Copy_cycles
counter:jvm_gc_Copy_msec
counter:jvm_gc_MarkSweepCompact_cycles
counter:jvm_gc_MarkSweepCompact_msec
counter:jvm_gc_cycles
counter:jvm_gc_msec
gauge:jvm_heap_committed
gauge:jvm_heap_max
gauge:jvm_heap_used
gauge:jvm_nonheap_committed
gauge:jvm_nonheap_max
gauge:jvm_nonheap_used
gauge:jvm_num_cpus
gauge:jvm_post_gc_Eden_Space_max
gauge:jvm_post_gc_Eden_Space_used
gauge:jvm_post_gc_Perm_Gen_max
gauge:jvm_post_gc_Perm_Gen_used
gauge:jvm_post_gc_Survivor_Space_max
gauge:jvm_post_gc_Survivor_Space_used
gauge:jvm_post_gc_Tenured_Gen_max
gauge:jvm_post_gc_Tenured_Gen_used
gauge:jvm_post_gc_used
gauge:jvm_start_time
gauge:jvm_thread_count
gauge:jvm_thread_daemon_count
gauge:jvm_thread_peak_count
gauge:jvm_uptime

FINAGLE CLIENT METRICS

counter:sampleFinagleThriftClient/closechans
counter:sampleFinagleThriftClient/closed
counter:sampleFinagleThriftClient/closes
counter:sampleFinagleThriftClient/connects
counter:sampleFinagleThriftClient/exn
counter:sampleFinagleThriftClient/exn/java.nio.channels.ClosedChannelException
counter:sampleFinagleThriftClient/failures
counter:sampleFinagleThriftClient/failures/com.twitter.finagle.CancelledRequestException
counter:sampleFinagleThriftClient/received_bytes
counter:sampleFinagleThriftClient/requests
counter:sampleFinagleThriftClient/sent_bytes
counter:sampleFinagleThriftClient/success
gauge:sampleFinagleThriftClient/connections
gauge:sampleFinagleThriftClient/failfast
gauge:sampleFinagleThriftClient/failfast/unhealthy_for_ms
gauge:sampleFinagleThriftClient/failfast/unhealthy_num_tries
gauge:sampleFinagleThriftClient/loadbalancer/size
gauge:sampleFinagleThriftClient/pending
gauge:sampleFinagleThriftClient/pool_size
gauge:sampleFinagleThriftClient/pool_waiters
metric:sampleFinagleThriftClient/codec_connection_preparation_latency_ms
metric:sampleFinagleThriftClient/connect_latency_ms
metric:sampleFinagleThriftClient/connection_duration
metric:sampleFinagleThriftClient/connection_received_bytes
metric:sampleFinagleThriftClient/connection_requests
metric:sampleFinagleThriftClient/connection_sent_bytes
metric:sampleFinagleThriftClient/request_latency_ms

FINAGLE SERVER METRICS

counter:sampleFinagleHttpServer/closechans
counter:sampleFinagleHttpServer/closed
counter:sampleFinagleHttpServer/closes
counter:sampleFinagleHttpServer/connects
counter:sampleFinagleHttpServer/exn/java.nio.channels.ClosedChannelException
counter:sampleFinagleHttpServer/failures/your.other.Exception
counter:sampleFinagleHttpServer/received_bytes
counter:sampleFinagleHttpServer/requests
counter:sampleFinagleHttpServer/sent_bytes
counter:sampleFinagleHttpServer/success
gauge:sampleFinagleHttpServer/connections
gauge:sampleFinagleHttpServer/pending
metric:sampleFinagleHttpServer/connection_duration
metric:sampleFinagleHttpServer/connection_received_bytes
metric:sampleFinagleHttpServer/connection_requests
metric:sampleFinagleHttpServer/connection_sent_bytes
metric:sampleFinagleHttpServer/handletime_us
metric:sampleFinagleHttpServer/request_latency_ms

Some explanations may be copied from
https://groups.google.com/forum/#!msg/finaglers/SCTpQWDvoB8/PMwtCuk1j54J

connections is the current number of connections between client and server
pending is the current number of connections waiting to be processed by finagle
request_concurrency is the current number of connections being processed by finagle
request_queue_size Number of requests waiting to be handled by the server
connection_duration duration of a connection from established to closed?
connection_received_bytes bytes received per connection
connection_requests Number of connection requests that your client did, ie. you can have a pool of 1 connection and the connection can be closed 3 times, so the "connection_requests" would be 4 (even if connections = 1)
connection_sent_bytes bytes send per connection
handletime_us is time to process the response from the server (ie. execute all the chained map/flatMap)
request_latency_ms is the time of everything between request/response.