KubeFlux: An HPC Scheduler Plugin for Kubernetes


Authors:   Claudia Misale, Daniel Milroy


The presentation discusses the potential benefits of converged computing, which combines cloud and high-performance computing (HPC) technologies, and the challenges in achieving fully featured HPC scheduling in Kubernetes.
  • Converged computing combines cloud and HPC technologies to enhance application performance, scalability, flexibility, and automation.
  • Fully featured HPC scheduling in Kubernetes has not yet been achieved, and there are challenges in co-scheduling, throughput, job communication and coordination, portability, and resource heterogeneity.
  • The Flux framework is an open-source project that solves the five key technical problems of converged computing.
  • Cloud computing is becoming a dominant market force, and HPC needs to integrate research and development in software and hardware to avoid becoming isolated.
  • LLNL is seeing demand for cloud technologies within HPC workflows, and there is potential to unite the two communities in a converged computing environment.
The American Heart Association Molecular Screening Workflow, also known as AHA Moles, and the Rapid COVID-19 Small Molecule Drug Design Workflow are examples of composite workflows that use Kubernetes. The 2020 Laboratory Application Survey found that fewer than 10% of applications are currently using cloud, but 73% may adopt cloud in the future.


Adoption of cloud technologies by high performance computing (HPC) is accelerating, and HPC users want their applications to perform well everywhere. While container orchestration frameworks provide advantages like resiliency, elasticity, and declarative management, they are not designed to enable application performance to the same degree as HPC workload managers and schedulers. In response to increased interest in scheduling flexibility, the Kubernetes community developed the Scheduling Framework to facilitate integration of new policies and schedulers. We present KubeFlux, a Scheduling Framework plugin based on the Fluxion open-source HPC scheduler developed at the Lawrence Livermore National Laboratory. We discuss uses for KubeFlux and compare the performance of an application scheduled by the Kubernetes default scheduler and KubeFlux. KubeFlux is an example of the rich capability that can be added to Kubernetes and paves the way to democratization of the cloud for HPC workloads.