Improving the Reliability of Kubernetes Load Balancers

Conference: KubeCon + CloudNativeCon Europe 2023

2023-04-21

Authors: Alexander Constantinescu

Summary

Improving the reliability of Kubernetes load balancers

Kubernetes load balancers are critical for application ingress
Current load balancer configuration is simplistic and introduces serious failure modes
The proposed solution involves refactoring support to better uphold application SLA
The talk covers the background, problem, solution, and future work

The talk discusses a scalability problem that arises during operations with services, specifically load balancers. This problem is tailored towards clusters running on the public cloud and involves load balancers that get provisioned by the Kubernetes control plane. The proposed solution aims to address the issues with the current load balancer configuration and improve the reliability of Kubernetes load balancers.

Abstract

Load balancers are a critical part of application ingress for Kubernetes clusters. One of the simplest ways of achieving this is creating a Service and specifying the type `LoadBalancer`. Kubernetes applies a simplistic interpretation of the cluster's networking state when configuring the load balancers with the set of nodes to be used as backends. This model introduces some serious failure modes for application ingress when the model becomes decorrelated and is completely orthogonal to the state of the application itself. Load balancers may not have the most up to date node set, go through unnecessary reconfigurations, and blindly route traffic without an application specific healthcheck. Moreover, the current mechanism has also proven to be computationally suboptimal and misses a lot of opportunities for more production-grade approaches such as allowing load balancers to dynamically route application related traffic without the need for reconfiguration. This talk will walk through the current implementation, the existing problems and the proposed north star. Alexander and Swetha will cover how the refactored support will better uphold application SLA.

Materials:

Tags: