logo

Surviving Day 2 - How to Troubleshoot Kubernetes Networking

2023-04-21

Authors:   Thomas Graf


Summary

The presentation discusses the importance of monitoring infrastructure using the Golden Signal Dashboard and Kubernetes Service Implementation.
  • The Golden Signal Dashboard is a standard way of monitoring infrastructure for publicly available services.
  • The four golden signals that matter are latency, traffic or throughput, errors, and saturation.
  • Kubernetes Service Implementation allows for multiple pod replicas to be exposed via a single IP and DNS name.
  • Network policies can cause problems that are hard to detect without proper observability tools.
  • Hubble UI and Hubble Observe CLI are useful tools for troubleshooting network issues.
The presenter demonstrates how a simple network policy can cause problems that are hard to detect without proper observability tools. The Hubble UI and Hubble Observe CLI are used to identify network policy drops from the front end to the cube DNS part.

Abstract

Kubernetes is widely deployed. Kubernetes networking is at the core of every platform and then there is DNS. In this talk, we will dive into the inner workings of Kubernetes networking, learn how to troubleshoot it, and most importantly, describe how to monitor it properly to prevent incidents in the first place. In this session, we will walk through the essential toolbox for efficient networking troubleshooting and then set up preventive measures together: - Understanding the Kubernetes networking model - How to troubleshoot and resolve DNS errors - Debugging Kubernetes Services & Ingress and increasing resiliency - Locating the source of networking errors - Is it an app, CNI, or underlying network problem? - Troubleshooting Kubernetes Network Policy drops - How to set up metrics dashboards and alerting to prevent network incidents All troubleshooting steps will be demonstrated in a live Kubernetes cluster and all steps will be found in the presentation slides and on GitHub.

Materials: