Kubernetes resource graphing with Heapster, InfluxDB and Grafana

I know that the Cloud Native Computing Foundation chose Prometheus as the monitoring platform of choice for Kubernetes, but in this post I'll show you how to quickly get started with graphing CPU, memory, disk and network in a Kubernetes cluster using Heapster, InfluxDB and Grafana.

The documentation in the kubernetes/heapster GitHub repo is actually pretty good. Here's what I did:

$ git clone https://github.com/kubernetes/heapster.git
$ cd heapster/deploy/kube-config/influxdb

Look at the yaml manifests to see if you need to customize anything. I left everything 'as is' and ran:

$ kubectl create -f .
deployment "monitoring-grafana" created
service "monitoring-grafana" created
deployment "heapster" created
service "heapster" created
deployment "monitoring-influxdb" created
service "monitoring-influxdb" created

Then you can run 'kubectl cluster-info' and look for the monitoring-grafana endpoint. Since the monitoring-grafana service is of type LoadBalancer, if you run your Kubernetes cluster in AWS, the service creation will also involve the creation of an ELB. By default the ELB security group allows 80 from all, so I edited that to restrict it to some known IPs.

After a few minutes, you should see CPU and memory graphs shown in the Kubernetes dashboard. Here is an example showing pods running in the kube-system namespace:

You can also hit the Grafana endpoint and choose the Cluster or Pods dashboards. Note that if you have a namespace different from default and kube-system, you have to enter its name manually in the namespace field of the Grafana Pods dashboard. Only then you'll be able to see data corresponding to pods running in that namespace (or at least I had to jump through that hoop.)

Here is an example of graphs for the kubernetes-dashboard pod running in the kube-system namespace:

For info on how to customize the Grafana graphs, here's a good post from Deis.


Popular posts from this blog

Performance vs. load vs. stress testing

Dynamic DNS updates with nsupdate and BIND 9

Running Gatling load tests in Docker containers via Jenkins