Working your Cluster: Smarter Scheduling Decisions for Your Workloads

Conference: KubeCon + CloudNativeCon Europe 2022

2022-05-18

Authors: Madalina Lazar, Denisio Togashi

Summary

Telemetry Aware Scheduling is an open-source project that uses telemetry to make smarter scheduling decisions for workloads in Kubernetes clusters.

Telemetry Aware Scheduling (TAS) is an open-source project that extends Kubernetes' scheduling paradigm to use knowledge of resources to impact scheduling decisions.
TAS uses telemetry to help make scheduling decisions and is an extender of the Kubernetes scheduler.
TAS allows for filtering and scoring nodes and utilizes node affinity rules via fixed and custom labels.
TAS uses telemetry where scheduling policies that are structurally based on rules which are based on metrics that come from the cluster.
TAS requires a metrics pipeline to expose, collect, store, and make metrics available to the Kubernetes custom metrics API.
TAS works together with the default scheduler and returns a suggested outcome of pod placement to the default scheduler.
TAS supports multi-metric rules that contain multiple metrics and can link them together with operators such as any off or all of.

When working with big clusters, pinpointing when a host or node can become unhealthy becomes harder. TAS solves this problem by using telemetry to make smarter scheduling decisions. TAS allows for filtering and scoring nodes and utilizes node affinity rules via fixed and custom labels. TAS uses telemetry where scheduling policies that are structurally based on rules which are based on metrics that come from the cluster. TAS works together with the default scheduler and returns a suggested outcome of pod placement to the default scheduler. TAS supports multi-metric rules that contain multiple metrics and can link them together with operators such as any off or all of.

Abstract

When deciding where to schedule your workloads, you have to consider more than just CPU and memory. Whether you are in 5G, AI/ML, HPC, or NFV, you have many more considerations to optimize your workloads. You may care about how busy the node is, how many GPU cards are attached, whether a minimal throughput is available, or whether the node is cooler than the temperature required for basic cooking. Fortunately, Kubernetes allows for extensions to its scheduling paradigm, which allows for new creative solutions going forward. Using these capabilities, we have created a way to use knowledge of your resources to impact your scheduling decisions. Telemetry Aware Scheduling and GPU Aware Scheduling, both open-source projects, enable you to use a variety of metrics in intelligent scheduling. In this talk, we will explain how to deploy and configure your system to handle your varied use cases.Click here to view captioning/translation in the MeetingPlay platform!

Materials:

Tags: