The presentation discusses how to map Kubernetes primitives to infrastructure and the role of platform reliability engineers in this process.
- Kubernetes provides primitives for defining applications, but infrastructure operators need to map these primitives to actual infrastructure solutions.
- Platform reliability engineers, or Kubernetes cluster operators, are responsible for mapping availability zones, security policies, load balancing, and metrics to infrastructure.
- VMware's software-defined data center can be used to map Kubernetes constructs to vSphere clusters, NSX distributed firewall, NSX load balancer, and Wavefront for monitoring.
- Pivotal Container Service (PKS) can be used to create a consistent and repeatable method for deploying a Kubernetes cluster.
- Infrastructure Offload can improve Kubernetes performance by moving network policy, routing, and load balancing rules off of the compute platform and into the infrastructure.
The speaker uses the example of an application developer creating an ELK stack and needing to define how the different components will intercommunicate, including persistent storage and security. The platform reliability engineer would need to map these definitions to actual infrastructure solutions, such as specific servers in a data center or a storage infrastructure.
Networking is central to Kubernetes, as it enables secure and deterministic scale out. As the number of services, pods, and interconnections increases, the kernel overhead will use more compute cycles, thereby lowering throughput and increasing latencies. Infrastructure Offload moves the Kubernetes cluster network policy, routing, and load balancing rules off of the compute platform and into the infrastructure. The cloud provider can then optimize these operations in software or in programmable hardware, such as an IPU or DPU, without requiring any changes to the end user’s applications. In this panel, we discuss various approaches that share a common methodology based on existing Kubernetes APIs to improve performance, free up compute cycles, and preserve compatibility with existing cloud native applications.