Five To-Dos When Monitoring Your Kubernetes Environment

If you’re on the DevOps front line, Kubernetes is fast becoming an essential element of your production cloud environment. Since container orchestration is critical to deploying, scaling, and managing your containerized applications, monitoring Kubernetes needs to be a big part of your application performance monitoring strategy.

Container environments don’t operate like traditional ones. So, if you are monitoring your applications and infrastructure, you need to be thoughtful about how you monitor your container environment in which they are running. Here are five best practices to inform your strategy:

  1. Centralize your logs and metrics. Orchestrating your containerized services and workloads through Kubernetes brings order to the chaos, but remember that your environment is still decentralized. You will give yourself a fighting chance if you centralize your logs and metrics.
  2. Account for ephemeral containers. The beauty of container orchestration is it’s easy to start, stop, kill, and clean up your containers in short order. However, monitoring them may not be so easy. You still need to debug problems and monitor cluster activity, even when services are coming and going. The trick is to grab the logs and metrics before they’re gone. If you don’t, your metrics will look more like the graph on the left than the one on the right.

  3. Simplify, simplify, simplify. With all of the moving pieces in your container environment (services, APIs, containers, orchestration tool), you need to monitor without introducing unneeded complexity. Rather than bloating your container with various monitoring agents, each requiring updates on unique schedules, abstract your monitoring and management tools from what you’re monitoring and managing. This will also help your engineers focus on building and delivering software, not operating the delivery platform.
  4. Monitor each layer explicitly. You will need to collect logs and monitor for errors, failures, and performance issues at each layer – the pod, the container, and the controller manager – of your environment. For example, you’ll need to be able to troubleshoot pod issues, ensure the container is working, and collect runtime metrics in the controller manager.
  5. Ensure data consistency across layers. For fast, accurate debugging, you need to ensure data consistency across all the layers in your container environment. Things like accurate timestamps, consistent units of measurement (such as milliseconds vs. seconds), and collecting a common set of metrics and logs across applications and components will help you troubleshoot and debug quickly and accurately across all of your layers.

One best practice for accomplishing these to-dos in a simple, straightforward manner is to monitor the containers in your Kubernetes environment without touching your application containers. Do this by introducing a DaemonSet, or alternatively a sidecar, into your Kubernetes environment(s) that sits alongside your containerized services and includes your logging and metrics collection agent. Deploying in this method will ensure consistent data collection, minimize the changes required to your application containers, and most importantly, eliminate the possibility of selective blindness in your production environment.

A few ways to implement this include:

  • Introduce a DaemonSet with the Fluentd logging agent (this will give you logging but not metrics). If you already have an ELK cluster configured, this is probably the option for you. Learn more here.
  • Introduce a DaemonSet or sidecar with the Prometheus metrics agent (CoreOS has done an excellent job of integrating Prometheus and Kubernetes). Running Prometheus on your Kubernetes cluster will give you metrics instrumentation, querying, and alerting. Learn more here.
  • A variety of metrics and performance monitoring tools, including Heapster, DataDog, cAdvisor, New Relic, Weave/VMware, and several others also offer a DaemonSet or sidecar options for Kubernetes monitoring.
  • Scalyr, log management for the DevOps front line, has a preconfigured DaemonSet containing the open source Scalyr agent available for download and use. The Scalyr DaemonSet natively supports both Kubernetes logging and metrics. You can download the YAML file for deploying the containerized Scalyr agent from GitHub here. Note that you also can download the full open-source Scalyr agent from GitHub here.