Presentations | Hack Dojo

Sort by:

How to Blow up a Kubernetes Cluster

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Felix Hoffmann

2023-04-19

tldr - powered by Generative AI

The presentation discusses resource management in Kubernetes from the perspective of an application developer, highlighting the importance of setting resource requests and limits appropriately to avoid cluster crashes and scheduling issues.

Resource management in Kubernetes involves setting CPU and memory requests and limits for containers
Memory limits result in termination of pods when exceeded, while CPU limits can lead to throttling or termination
Setting appropriate requests and limits is crucial for efficient scheduling and avoiding noisy neighbors
Developers should be aware of namespace limits and available resources when setting requests and limits
In general, it is advisable to set memory requests equal to memory limits and avoid setting CPU limits
Exceptions include cases where consistent workloads or overcommitment of memory are preferred

Tags:

Show 0 Comments

Network-aware Scheduling in Kubernetes

Conference: KubeCon + CloudNativeCon Europe 2022

Authors: José Santos

2022-05-18

tldr - powered by Generative AI

The presentation discusses a network-aware framework for workload scheduling in Kubernetes clusters, which aims to reduce latency and improve performance.

The network-aware framework uses a combination of plugins and algorithms to optimize workload scheduling based on network topology and bandwidth resources.
The framework includes an application group and network topology controller, load watcher component, and a scheduler with filtering and scoring functions.
The framework was tested with the Redis cluster application and was able to improve throughput by 20% on average.
The framework is not yet production-ready but is expected to be included in the Seek scheduling community in the next few months.
Future plans include adding a plugin for monitoring bandwidth and dynamically adjusting workload scheduling based on real-time network congestion.
An anecdote was provided demonstrating the performance improvement of the online boutique application with the network-aware framework compared to the default Kubernetes scheduler.

Tags:

Show 0 Comments

KubeFlux: An HPC Scheduler Plugin for Kubernetes

Conference: KubeCon + CloudNativeCon Europe 2022

Authors: Claudia Misale, Daniel Milroy

2022-05-18

tldr - powered by Generative AI

The presentation discusses the potential benefits of converged computing, which combines cloud and high-performance computing (HPC) technologies, and the challenges in achieving fully featured HPC scheduling in Kubernetes.

Converged computing combines cloud and HPC technologies to enhance application performance, scalability, flexibility, and automation.
Fully featured HPC scheduling in Kubernetes has not yet been achieved, and there are challenges in co-scheduling, throughput, job communication and coordination, portability, and resource heterogeneity.
The Flux framework is an open-source project that solves the five key technical problems of converged computing.
Cloud computing is becoming a dominant market force, and HPC needs to integrate research and development in software and hardware to avoid becoming isolated.
LLNL is seeing demand for cloud technologies within HPC workflows, and there is potential to unite the two communities in a converged computing environment.

Tags:

Show 0 Comments

Capacity Scheduling for Elastic Resource Sharing in Kubernetes

Conference: KubeCon + CloudNativeCon North America 2021

Authors: Yuan Chen, Alex Wang

2021-10-13

tldr - powered by Generative AI

The presentation discusses the elastic quota and job queue components of the Kubernetes scheduler and their compatibility with various workload management systems.

The elastic quota and job queue components are part of the Kubernetes scheduler and have been extensively tested.
The components are compatible with various workload management systems and can be configured to meet specific needs.
The goal is to make the components production-ready and widely adopted.
The presentation mentions Alibaba and Apple as early adopters of the components.
The components can be used for scheduling multiple jobs at the same time and ensuring that resources are not exceeded.
The presentation also discusses the possibility of using the components for nomad-style scheduling and SLA-driven scheduling.

Tags:

Show 0 Comments