logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Andre Marcelo-Tanner
2023-04-20

tldr - powered by Generative AI

Lessons learned from a Kubernetes outage and disaster recovery process
  • Complete your migrations
  • Be experts in your tooling
  • Always be practicing your disaster recovery
Authors: Marek Siarkowicz, James Blair, Samba Bandari, Bogdan Kanivets
2023-04-20

Download the code ahead of time. DCO required.Join the contributors to Etcd, the most popular cloud-native database that backs Kubernetes. We'll be working on improving key features and testing for Etcd, and in the process we’ll teach those new to the project how to contribute. Etcd is a very useful, fun, and essential project, and welcomes both new contributors and those who want to “levelup”.Attendees should be familiar with programming in Go, using GitHub, and should bring a laptop on which they can do cloud-native development: either a Linux laptop, your own Github Devcontainer setup, or some equivalent.Etcd maintainers will organise work to improve reliability of Etcd. We will focus on improving etcd robustness testing and paying technical depth.This Contribfest session is designed to provide projects with the space and resources to tackle outstanding technical debt, security issues, or outstanding impactful feature requests. They are intended to provide a place for maintainers to meet contributors and potential contributors and work together on solving a problem.
Authors: Laurent Bernaille, Marcel Zięba
2023-04-20

tldr - powered by Generative AI

The presentation discusses challenges in running large Kubernetes clusters and offers best practices to overcome them. It also highlights the importance of using informers and avoiding list calls to improve performance.
  • Running large Kubernetes clusters is challenging despite community improvements
  • Defaults are not always enough and best practices should be followed
  • Avoid list calls and use informers to improve performance
  • Memory and CPU buffer should be maintained to handle bad events
  • Streaming lists in Kubernetes 1.27 can improve memory usage
Authors: Chao Chen, Geeta Gharpure
2023-04-19

tldr - powered by Generative AI

Operational issues and their mitigations in running etcd
  • Database size exceeding
  • Revision divergence
  • Out of memory panic
  • Timeouts due to defrag
  • Oversized requests
Authors: Marek Siarkowicz, Benjamin Wang
2022-10-26

Earlier the year there was an event that shook the cloud native ecosystem. The latest release of etcd had a critical data inconsistency issue. Etcd, the critical component that powers many cloud native solutions including Kubernetes, could corrupt your data. The issue was so bad, that it required every single administrator to take an action or risk their system becoming unrecoverable. This presentation will discuss what led to the data inconsistency issues, how they were discovered, what was needed to fix them and what lessons we learned that could benefit the whole community.