Trimaran: Real Load Aware Scheduling in Kubernetes

Conference: KubeCon + CloudNativeCon North America 2021

2021-10-13

Authors: Chen Wang, Abdul Qadeer

Summary

Load balancing and resource allocation in Kubernetes clusters using Trimaran plugins

Trimaran is a set of plugins for Kubernetes clusters that optimize resource allocation and load balancing
The Target Load Packing plugin aims to achieve high utilization across all nodes while maintaining a safe margin for CPU usage spikes
The Load Variation Risk Balancing plugin computes a risk score based on CPU and memory utilization and chooses the bottleneck resource score
Trimaran uses multiple metric sources and caches data to avoid overwhelming metric providers
Future work includes integrating Trimaran with other schedulers and incorporating additional resources like IO and network latency

In an experiment with 100 nodes and 400 pods, the Target Load Packing plugin resulted in better capacity utilization, fewer hot nodes, and fewer fragmented cores compared to the default scheduler

Abstract

Kubernetes is a popular solution for container orchestration and cluster management. Cluster management creates opportunity to improve resource utilization which can provide an organization with cost savings. To achieve this, we can make the native Kubernetes scheduler aware of the gap between its declarative resource allocation model and actual node resource utilization. We can pack pods more efficiently in a lower number of nodes considering real load of nodes. Native scheduler on the other hand only considers pod requests and allocable resources on nodes with its default plugins. We introduced two plugins to the scheduler community - TargetLoadPacking and LoadVariationRiskBalancing under the Trimaran framework to address this problem with collaboration between PayPal and IBM. The plugins provide scheduling support for all pod QoS guarantees.

Materials:

Tags:

Container Orchestration