Presentations | Hack Dojo

Sort by:

Kubernetes, Resistance Is Futile

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Adnan Hodzic

2023-04-19

This talk covers ING’s MLP (Machine Learning Platform) 2+ year migration journey to Kubernetes. ING being the biggest bank in the Netherlands and one of the biggest world banks entails we work in a highly regulated environment and are subjected to rigorous policies in terms of control with IT process lifecycle. Being a data scientist in one such environment, who would like to deploy pre-trained machine learning models to Production, without much or any underlying SRE/deployment knowledge complicates things. That’s where MLP (Machine Learning Platform) steps in, as it takes care of all the above mentioned problems by serving as a model hosting platform. As an SRE Adnan will cover problems and limitations of the existing platform setup in the VM (Virtual Machine) world and the inception of an idea to migrate to Kubernetes. Which steps it took to start the realization of one such idea and its migration plan. Followed by resistance, inability to choose the ideal target destination, platform’s growth and challenge in supporting the current setup in its growing capacity and ultimately leading to scalability issues. All these factors lead to a perfect storm, which led to the inevitable. Migration to Kubernetes and how that process came to be.

Tags:

Machine Learning Platform

Kubernetes

SRE

scalability

Show 0 Comments

Intro + Deep Dive: Kubernetes SIG Scalability

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Wojciech Tyczyński

2023-04-19

tldr - powered by Generative AI

Tips for dividing workloads among multiple clusters in Kubernetes

Networking is the most stressing for the control plane and where the biggest number of issues are seen
Understanding the size of churn forward or observed services is a significant factor in workload division
The current scalability limit of 5000 nodes is not a hard limit and there are no plans to push it further in open source
External factors like third-party controllers and ecosystem improvements need to be addressed
Using the watch protocol for getting large collections of data can help with memory consumption and system throughput
Graceful shutdowns can prevent the control plane from being blown out by hundreds of thousands of watches
Optimizations should be balanced with complexity versus return on investment trade-off

Tags:

Show 0 Comments

Kubernetes On the Edge With K3s For a Smart Metering Use Case

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Harry Lee

2022-10-27

tldr - powered by Generative AI

The conference presentation discusses the design and implementation of a central data aggregation platform for a smart energy management system in South Africa, targeting big energy consumers such as office blocks, industrial factories, and the mining industry. The platform uses IoT devices to measure energy usage, estimate costs, and optimize electricity usage with automation. The presentation highlights the challenges of building a solution for companies and industrial plants located in rural areas with infrastructure limitations, intermittent internet connectivity, and power outages due to load shedding. The solution needs to be resilient, work offline, and use open-source technologies. Kubernetes is chosen for its resilience, high availability, and ability to run pods from previous states.

South Africa is facing an energy crisis due to a limited supply of electricity, which drives up costs and impacts businesses heavily reliant on electricity
The smart energy management system targets big energy consumers and uses IoT devices to measure energy usage, estimate costs, and optimize electricity usage with automation
The central data aggregation platform is designed to work offline, be resilient, and use open-source technologies
Kubernetes is chosen for its resilience, high availability, and ability to run pods from previous states
The solution needs to be flexible, work with existing network infrastructure, and reduce setup costs
Multiple teams are involved in building the IoT devices, gateway, IoT platform, and advanced data analytics in the cloud

Tags:

Show 0 Comments

Intro + Deep Dive: SIG Scalability

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Marcel Zięba

2022-10-27

tldr - powered by Generative AI

The presentation discusses the importance of scalability and reliability in Kubernetes and how to improve it.

Using immutable secrets can make Kubernetes API more reliable
Priority and fairness can increase the reliability of Kubernetes
Efficiently designed controllers with CRDs are not a problem
Node-oriented controllers can cause scalability issues
Redesigning individual components should be a last resort
Deprecating features should be avoided to prevent breaking users
Introducing more efficient ways of doing things can steer people towards more scalable regressions
Load testing can be helpful for component maintainers

Tags:

Show 0 Comments

🚨 ContribFest - KubeVirt: Work on Core Components (and Docs!) with the KubeVirt Maintainers (Limited Availability; First-Come, First

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Alexander Wels, Michael Henriksen, Ryan Hallisey, Kat Morgan

2022-10-26

Download the code ahead of time. DCO Required.The KubeVirt Maintainers will organize into small groups to help improve scalability of KubeVirt components.This Contribfest session is designed to provide projects with the space and resources to tackle outstanding technical debt, security issues, or outstanding impactful feature requests. They are intended to provide a place for maintainers to meet contributors and potential contributors and work together on solving a problem.

Tags:

Show 0 Comments

Intro + Deep Dive: SIG Scalability

Conference: KubeCon + CloudNativeCon Europe 2022

Authors: Wojciech Tyczyński, Marcel Zięba

2022-05-20

tldr - powered by Generative AI

The presentation discusses the implementation of efficient watch resumption or immutable secrets in Kubernetes to increase reliability and scalability. The speaker also talks about the tools and infrastructure used for scalability testing in Kubernetes.

Using immutable secrets can make Kubernetes API more reliable and reduce pressure on API servers
Priority and fairness are heavily worked on to increase Kubernetes reliability
Cluster loader two is a tool used for scalability testing in Kubernetes
Cubemark is a simulation of the cluster used for scalability testing instead of running 5000 nodes
Whole nodes and hollow nodes are used in Cubemark to simulate regular nodes without actually running pods
Hollow cube proxy is a part of Kubernetes that puts pressure on the API server

Tags:

Show 0 Comments

The CRDs that Broke the Camel's Back

Conference: KubeCon + CloudNativeCon Europe 2022

Authors: Alper Rifat Ulucinar

2022-05-18

tldr - powered by Generative AI

The talk discusses the performance issues related to the API server when installing thousands of CRDs and how to troubleshoot them using profiling tools. It also provides insights into the mechanics of CRDs and tips for getting changes into upstream.

Custom resources are used to extend the K8s API server with a declarative API
Initial attempts to install thousands of CRDs revealed severe performance issues related to the API server
Profiling tools can be used to troubleshoot API server performance issues
Real world data can help pinpoint the root causes of scaling issues
Insights into the mechanics of CRDs are provided
Tips for getting changes into upstream and moving the ecosystem forward are shared

Tags:

Show 0 Comments

Cortex: Intro and Production Tips

Conference: KubeCon + CloudNativeCon North America 2021

Authors: Bryan Boreham, Alvin Lin

2021-10-15

Cortex is a time-series data store based on Prometheus. Cortex adds: - Scalability: run across dozens of servers to handle millions of samples per second. - Availability: if one server fails then work will be redirected to others. - Multi-tenancy: store data from different groups or customers, segregated so a user from one tenant cannot see data from another. - Durability: use cloud stores (such as S3) to reduce the chance of data loss. This session will provide an overview of Cortex, an update on recent news from the project, and a run-through of top 5 tips for running Cortex in production.

Tags:

Show 0 Comments

Intro + DeepDive: SIG Scalability

Conference: KubeCon + CloudNativeCon North America 2021

Authors: Wojciech Tyczyński, Marcel Zięba

2021-10-15

tldr - powered by Generative AI

The presentation discusses the efforts of SIG Scalability in defining and improving scalability in Kubernetes, as well as monitoring and guarding against performance regressions.

SIG Scalability is focused on defining what scalability means for Kubernetes and executing towards those goals
They work with individual SIGs to ensure improvements are made and contribute to cross-SIG improvements
Monitoring and measuring current scalability levels is critical to understanding progress towards goals
Guarding against performance regressions is important to maintain scalability
Scalability is a job for everyone in the community, not just a small group

Tags:

Show 0 Comments

Using SLOs for Continuous Performance Optimizations of Your K8s Workloads

Conference: KubeCon + CloudNativeCon North America 2021

Authors: Andreas Grabner

2021-10-13

Moving to k8s doesn’t prevent anyone from bad architectural decisions leading to performance degradations, scalability issues or violating your SLOs in production. In fact – building smaller services running in pods connected through service meshes are even more vulnerable to bad architectural or implementation choices. To avoid any bad deployments, the CNCF project Keptn provides automated SLO-based Performance Analysis as part of your CD process. Keptn automatically detects architectural and deployment changes that have a negative impact to performance and scalability. It uses SLOs (Service Level Objectives) to ensure your services always meet your objectives. The Keptn team has also put out SLO best practices to identify well known performance patterns that have been identified over the years analyzing hundreds of distributed software architectures deployed on k8s. Join this session and learn what these patterns are and how Keptn helps you prevent them from entering production.

Tags:

Show 0 Comments

Dates

Author

Conferences

Tags

Kubernetes, Resistance Is Futile

Intro + Deep Dive: Kubernetes SIG Scalability

tldr - powered by Generative AI

Kubernetes On the Edge With K3s For a Smart Metering Use Case

tldr - powered by Generative AI

Intro + Deep Dive: SIG Scalability

tldr - powered by Generative AI

🚨 ContribFest - KubeVirt: Work on Core Components (and Docs!) with the KubeVirt Maintainers (Limited Availability; First-Come, First

Intro + Deep Dive: SIG Scalability

tldr - powered by Generative AI

The CRDs that Broke the Camel's Back

tldr - powered by Generative AI

Cortex: Intro and Production Tips

Intro + DeepDive: SIG Scalability

tldr - powered by Generative AI

Using SLOs for Continuous Performance Optimizations of Your K8s Workloads