Five Ways With a CNI: Understanding Kubernetes Networking For Performance-Intensive Workloads

Conference: KubeCon + CloudNativeCon North America 2022

2022-10-27

Authors: Stig Telfer, Erez Cohen

Summary

The presentation discusses the different Kubernetes Container Network Interface (CNI) configurations for performance-intensive workloads, particularly in the fields of high-performance computing (HPC) and artificial intelligence (AI). The focus is on network throughput, latency, CPU offload capabilities, and GPU technologies like GPU direct and RDMA. The presentation includes a test case for genome sequencing and measures the sequencing performance for host networks and different CNIs.

Kubernetes networking is crucial for performance-intensive workloads in HPC and AI
Different CNIs offer various architectures and technologies that claim performance advantages
Network throughput, latency, CPU offload capabilities, and GPU technologies are essential considerations for these workloads
RDMA or remote direct memory access is a transport service that supports memory read and write semantics, kernel bypass, and hardware offloads
SDN acceleration offloads enable the highest performance, supporting the use of RDMA protocols
Calico performed particularly well in the test case for genome sequencing
With hardware-accelerated networking, performance of Kubernetes on bare metal hosts and in openstack VMS can be almost indistinguishable from performance on bare metal

The presentation includes a test case for genome sequencing, which is a vital application in the world's response to the COVID-19 pandemic. The base calling process, which uses neural networks to extract base sequences from the noise signal data, was presented as a service in a Kubernetes deployment. The results showed that there was no obvious impact on performance for using Kubernetes service networking for base calling, and the base calling rate was even higher when put behind a service IP in Calico.

Abstract

Network abstractions are a cornerstone of Kubernetes, interconnecting containers, pods and services, both across the cluster and beyond. Historically this rich functionality often came at the expense of performance. However, recent innovations and new implementations of the Kubernetes Container Network Interface (CNI) have transformed the Kubernetes networking landscape. Some classes of scientific computing workloads are highly network-intensive: sensitive to network performance to a degree that renders them unsuitable for execution in a Kubernetes context. In machine learning, the most demanding distributed training applications are also becoming increasingly network-intensive. The CNIs now available offer different architectures and technologies. Many claim performance advantages over their predecessors. How can we distinguish between them? In this talk we describe the leading CNIs, and use real-world benchmarks to compare and analyze performance for demanding workloads.

Materials:

Tags: