The presentation discusses the different Kubernetes Container Network Interface (CNI) configurations for performance-intensive workloads, particularly in the fields of high-performance computing (HPC) and artificial intelligence (AI). The focus is on network throughput, latency, CPU offload capabilities, and GPU technologies like GPU direct and RDMA. The presentation includes a test case for genome sequencing and measures the sequencing performance for host networks and different CNIs.
- Kubernetes networking is crucial for performance-intensive workloads in HPC and AI
- Different CNIs offer various architectures and technologies that claim performance advantages
- Network throughput, latency, CPU offload capabilities, and GPU technologies are essential considerations for these workloads
- RDMA or remote direct memory access is a transport service that supports memory read and write semantics, kernel bypass, and hardware offloads
- SDN acceleration offloads enable the highest performance, supporting the use of RDMA protocols
- Calico performed particularly well in the test case for genome sequencing
- With hardware-accelerated networking, performance of Kubernetes on bare metal hosts and in openstack VMS can be almost indistinguishable from performance on bare metal
The presentation includes a test case for genome sequencing, which is a vital application in the world's response to the COVID-19 pandemic. The base calling process, which uses neural networks to extract base sequences from the noise signal data, was presented as a service in a Kubernetes deployment. The results showed that there was no obvious impact on performance for using Kubernetes service networking for base calling, and the base calling rate was even higher when put behind a service IP in Calico.