SMI Metrics API
This repository is an implementation of the Traffic Metrics Spec which follows the format of the official Metrics API being built on top of kubernetes to get metrics about pods and nodes. Here, the metrics are about the inbound and outbound requests i.e their Golden Metrics like p99_latency, p55_latency, success_count, etc for a particular workload.
Installation
For Linkerd:
helm template chart --set adapter=linkerd | kubectl apply -f -
For Istio
helm template chart --set adapter=istio | kubectl apply -f -
Roadmap
The API supports linkerd and Istio right now. The support for Consul is being worked up on right now and the API and responses will have the same structure unless there are no changes to the spec.
Working
The SMI metrics api is a Kubernetes APIService as seen in the installation manifest, which is a way of extending the Kubernetes API.
We will perform installation of the SMI Metrics API w.r.t linkerd. Make sure linkerd is installed and is running as per the instructions here, This API can be installed by running the following command
helm template chart -f dev.yaml -f linkerd.yaml --name dev | kubectl apply -f -
The installation of the APIService can be verified by running
kubectl get apiservice | grep smi
v1alpha1.metrics.smi-spec.io default/dev-smi-metrics True 25h
v1alpha1.metrics.smi-spec.io default/dev-smi-metrics True 25h
The APIService first informs the kubernetes API about itself and the resource types that it exposes, which can be viewed by running
kubectl api-resources | grep smi
NAME SHORTNAMES APIGROUP NAMESPACED KIND
daemonsets metrics.smi-spec.io true TrafficMetrics
deployments metrics.smi-spec.io true TrafficMetrics
namespaces metrics.smi-spec.io false TrafficMetrics
pods metrics.smi-spec.io true TrafficMetrics
statefulsets metrics.smi-spec.io true TrafficMetrics
This means that, the APIService can be queried regarding the above mentioned resource types.
Now that the SMI APIService is installed, metric queries can be done through the kubernetes API.
Because metrics only return the last 30s by default, if you'd like to see the metrics, make sure there is something happening on your cluster. The easiest way to do this is to run linkerd dashboard
and see the traffic generated by viewing that page.
kubectl get --raw /apis/metrics.smi-spec.io/v1alpha1/namespaces/linkerd/deployments/linkerd-web | jq
Output:
{
"kind": "TrafficMetrics",
"apiVersion": "metrics.smi-spec.io/v1alpha1",
"metadata": {
"name": "linkerd-web",
"namespace": "linkerd",
"selfLink": "/apis/metrics.smi-spec.io/v1alpha1/namespaces/linkerd/deployments/linkerd-web",
"creationTimestamp": "2019-06-17T14:26:22Z"
},
"timestamp": "2019-06-17T14:26:22Z",
"window": "30s",
"resource": {
"kind": "Deployment",
"namespace": "linkerd",
"name": "linkerd-web"
},
"edge": {
"direction": "from",
"resource": null
},
"metrics": [
{
"name": "p99_response_latency",
"unit": "ms",
"value": "296875m"
},
{
"name": "p90_response_latency",
"unit": "ms",
"value": "268750m"
},
{
"name": "p50_response_latency",
"unit": "ms",
"value": "162500m"
},
{
"name": "success_count",
"value": "73492m"
},
{
"name": "failure_count",
"value": "0"
}
]
}
As we can see, the golden metrics of a particular resource can be retrieved by querying the API with a path format /apis/metrics.smi-spec.io/v1alpha1/namespaces/{Namespace}/{Kind}/{ResourceName}
Queries for golden metrics on edges i.e paths associated with a particular resource, for example linkerd-controller can also be done by adding /edges
to the path.
kubectl get --raw /apis/metrics.smi-spec.io/v1alpha1/namespaces/linkerd/deployments/linkerd-controller/edges | jq
Output:
{
"kind": "TrafficMetricsList",
"apiVersion": "metrics.smi-spec.io/v1alpha1",
"metadata": {
"selfLink": "/apis/metrics.smi-spec.io/v1alpha1/namespaces/linkerd/deployments/linkerd-controller/edges"
},
"resource": {
"kind": "Deployment",
"namespace": "linkerd",
"name": "linkerd-controller"
},
"items": [
{
"kind": "TrafficMetrics",
"apiVersion": "metrics.smi-spec.io/v1alpha1",
"metadata": {
"name": "linkerd-controller",
"namespace": "linkerd",
"selfLink": "/apis/metrics.smi-spec.io/v1alpha1/namespaces/linkerd/deployments/linkerd-controller/edges",
"creationTimestamp": "2019-06-17T14:51:57Z"
},
"timestamp": "2019-06-17T14:51:57Z",
"window": "30s",
"resource": {
"kind": "Deployment",
"namespace": "linkerd",
"name": "linkerd-controller"
},
"edge": {
"direction": "from",
"resource": {
"kind": "Deployment",
"namespace": "linkerd",
"name": "linkerd-web"
}
},
"metrics": [
{
"name": "p99_response_latency",
"unit": "ms",
"value": "294"
},
{
"name": "p90_response_latency",
"unit": "ms",
"value": "240"
},
{
"name": "p50_response_latency",
"unit": "ms",
"value": "150"
},
{
"name": "success_count",
"value": "28580m"
},
{
"name": "failure_count",
"value": "0"
}
]
},
{
"kind": "TrafficMetrics",
"apiVersion": "metrics.smi-spec.io/v1alpha1",
"metadata": {
"name": "linkerd-controller",
"namespace": "linkerd",
"selfLink": "/apis/metrics.smi-spec.io/v1alpha1/namespaces/linkerd/deployments/linkerd-controller/edges",
"creationTimestamp": "2019-06-17T14:51:58Z"
},
"timestamp": "2019-06-17T14:51:57Z",
"window": "30s",
"resource": {
"kind": "Deployment",
"namespace": "linkerd",
"name": "linkerd-controller"
},
"edge": {
"direction": "to",
"resource": {
"kind": "Deployment",
"namespace": "linkerd",
"name": "linkerd-prometheus"
}
},
"metrics": [
{
"name": "p99_response_latency",
"unit": "ms",
"value": "368"
},
{
"name": "p90_response_latency",
"unit": "ms",
"value": "247199m"
},
{
"name": "p50_response_latency",
"unit": "ms",
"value": "120731m"
},
{
"name": "success_count",
"value": "1008100m"
},
{
"name": "failure_count",
"value": "0"
}
]
}
]
}
Development
- Get tilt, run
tilt up
. - The prometheus API client has been mocked out with a tool
mockery
. When bumping the API client version, a new mock will need to be generated. This can be done by checking out the correct version of the API client repo, runningmockery -name API
and copying themocks
folder intopkg/metrics/mocks
.
Admin
/debug
/metrics
/status
TODO
API Questions
- ObjectMeta includes OwnerReferences, Labels and Annotations. Should any of these be included as part of TrafficMetrics?
- ObjectReference has ResourceVersion and APIVersion, pull these in?
Internal details
- export prometheus for client-go
- integrate swagger with apiservice (OpenAPI AggregationController)
Contributing
Please refer to CONTRIBUTING.md for more information on contributing.