logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Matthias Haeussler, Tiffany Jernigan
2023-04-21

tldr - powered by Generative AI

The presentation discusses production readiness in the context of Kubernetes and cloud-native technologies. It emphasizes the importance of reliability, stability, security, performance, adaptability, and observability in ensuring user satisfaction.
  • Production readiness is the state of a system that is fully prepared and capable of running product workloads and providing the level of service and performance required by its users.
  • Vanilla Kubernetes provides the basic framework for running and managing container workloads, but production readiness requires more than just putting an application in a container and writing some YAML files.
  • There are various options for achieving production readiness, and the choice depends on factors such as the need for in-house skill, the desire for fast deployment, and the willingness to pay for managed solutions.
  • Going with the highest abstraction possible and not trying to solve problems that others have already solved is generally recommended.
  • Monitoring and observability are crucial for ensuring production readiness and enabling predictive analysis.
  • The ultimate goal of production readiness is to make users happy by providing reliable, stable, secure, performant, adaptive, and observable services.
Authors: Sunil Shah, Ramya Krishnan, Ashley Cutalo, Madhu C.S., Fabio Kung
2022-10-27

Kubernetes clusters are critical infrastructure at large, public companies, with large amounts of traffic, complex dependencies on 3rd party services, and constant change as developers release features and traffic scales up and down. In this panel discussion, engineers from Airbnb, Lyft, Netflix and Robinhood share their challenges, experiences and learnings when it comes to managing a sustainable on-call rotation that meets the needs of their internal users whilst maintaining a high uptime to serve business critical workloads. Topics covered will include: +Keeping on-call engineers happy + Balancing rapid response with alert fatigue + Strategies to proactively deal with production issues + Preparing engineers for on-call
Authors: Melanie Cebula
2022-10-27

tldr - powered by Generative AI

The presentation discusses the process of implementing multi-architectures in Airbnb's infrastructure to improve price and performance.
  • Focus on one or two workloads that have a business need for better price and performance
  • Form a small pilot group with subject matter experts
  • Upgrade and migrate operating system, languages, runtimes, and open source software
  • Automate the process of building, uploading, and signing packages
  • Invest in performance tooling and analysis
Authors: Jay Vyas, Claudiu Belu, Mark Rossetti, Brandon Smith
2022-05-18

Running Kubernetes on Windows is increasingly a viable production strategy for complex applications in multitenant environments. In this presentation we'll highlight recent improvements - such as the pod.OS field and advancements in host-process containers for infrstractuure - that make it easier to manage production clusters/workloads, show people how to rapidly prototype the development of new Kubernetes features using the SIG-Windows developer tools project, and also do a deep-dive into how container users work on Windows.Click here to view captioning/translation in the MeetingPlay platform!
Authors: Steve Gray
2021-10-15

tldr - powered by Generative AI

Deploying a service mesh to production can provide immediate operational and cost economy benefits for a large number of services using long-lived RPC protocols in Kubernetes.
  • Transitioning to Kubernetes logical services improved service discovery and connectivity between services.
  • Using gRPC for backbone protocols resulted in long-lived connections and inefficient load balancing.
  • Deploying a service mesh to production provided immediate operational and cost economy benefits, including reduced costs for moving data between zones and mutual TLS support for all communications within the pod.
  • Future plans include deploying Chaos Mesh more fully and switching to isolated individual clusters per geographic area.
Authors: Michael Bridgen, Hidde Beydals
2021-10-14

tldr - powered by Generative AI

Flux is a Kubernetes operator that automates the deployment and scaling of containerized applications
  • Flux uses image automation to update container images in a cluster
  • Flux has varying unit test coverage and is working on standardization and RBAC security model
  • Flux is reorganizing its documentation to better serve different entry points and cloud platforms
Authors: Daniel Finneran
2021-10-13

tldr - powered by Generative AI

The presentation discusses the journey of developing Kube-vip, a project that provides highly available Kubernetes clusters for various infrastructures, and how it can be used to implement highly available networking and load balancer functionality for Kubernetes services.
  • The presenter started by trying to improve the deployment of Kubernetes clusters on bare-metal and taking them into production
  • Ensuring highly available access to clusters proved problematic to implement and implement into lifecycle patterns
  • Kube-vip evolved from trying to fix that one use case into a widely used project that provides highly available Kubernetes clusters for various infrastructures
  • Kube-vip uses leader election and clustering technology to ensure highly available access to Kubernetes clusters
  • Kube-vip relies on ARP and BGP protocols to update the network and route traffic to the correct node
  • Kube-vip can be used to implement highly available networking and load balancer functionality for Kubernetes services
Authors: Arun Sriraman
2021-10-13

tldr - powered by Generative AI

The presentation discusses troubleshooting techniques for Kubernetes networking issues and introduces Kades Network, a tool for automated debugging.
  • Two common approaches to fixing issues are the 'big hammer' approach of restarting or deleting components and asking for help from specific groups or individuals
  • Identifying the type of problem and traffic path is crucial in troubleshooting
  • Kades Network is a tool that automates the debugging process by performing connectivity and path MTU checks
  • The tool is not CNI aware and does not provide automation for external reports
  • Contributors are welcome to improve the tool
Conference:  Transform X 2021
Authors: Drew Conway, Cassie Kozyrkov, Deepna Devkar, Jaclyn Rice Nelson
2021-10-07

Hosted by Tribe AI. Poor data quality. Inability to access the right talent. Failure to get models into production. When it comes to moving up the AI adoption curve, what's really holding businesses back? In this panel, you'll learn how technical leaders at enterprises like Google, CNN, and TwoSigma think about building higher performing teams and operationalizing machine learning projects to deliver business value in production.