logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Michelle Nguyen, Hannah Troisi, Clemens Kolbitsch, Vihang Mehta
2023-04-21

tldr - powered by Generative AI

The conference presentation discusses the practicality of managing storage for multiple integrated applications in a busy environment, with a focus on the use of open telemetry and Pixie solutions.
  • The speaker addresses a question from an audience member regarding the storage of multiple integrated applications in a busy environment
  • The speaker explains that open telemetry and Pixie solutions can be used to capture and store data locally, and then filter and batch the data as needed
  • The speaker emphasizes the importance of considering sample strategies, filtering, and batching when designing an architecture for managing storage in a busy environment
Authors: Ana Medina, Brad McCoy, Meha Bhalodiya, Giovanni Liva
2023-04-21

tldr - powered by Generative AI

The presentation discusses the use cases and benefits of the Captain Kubernetes deployment lifecycle management toolkit, which provides observability and automation for Kubernetes deployments.
  • Captain Kubernetes provides observability and automation for Kubernetes deployments
  • Use cases include preventing bad deployments during maintenance windows, supporting quality gates, and running evaluations after deployments
  • Captain Kubernetes allows for monitoring and tracing of application deployments across different stages
  • The Captain metric server brings observability directly into the cluster by exposing Kubernetes native metrics
  • This allows for configuration of HPA, Argo Rollout, and Flux to use Kubernetes native metrics
Authors: Dan Jaglowski
2023-04-21

tldr - powered by Generative AI

The OpenTelemetry Collector is now a more capable tool for processing telemetry due to the introduction of the Connectors framework.
  • The Connectors framework allows for the creation of generalized systems for managing telemetry.
  • Connectors can be used to replicate and merge data streams, apply sampling criteria, and reason about multiple data types in one place.
  • The structure of pipelines and data streams in the OpenTelemetry Collector is governed by certain rules and expectations.
  • The Connectors framework can be used to address limitations in the existing pipeline structure.
  • An anecdote is provided to illustrate how the Connectors framework can be used to filter and redact telemetry data.
Authors: Venkata Gunapati, Anusha Ragunathan
2023-04-21

As Platform Engineers & SREs, we love metrics from Kubernetes clusters to understand Platform Health. However, we dislike drowning in alerts on every metric & experiencing alert fatigue. The worst consequence of alert fatigue is not just on-call engineer burn out, but on-call snoozing alerts that could prevent incidents. At Intuit, we needed a smarter way to get alerted on a cluster’s Golden Signals, which are picked from an ocean of metrics. This would help reduce the MTTD during incidents. We wanted to achieve this without the burden of instrumenting cluster components. Observability vendors provide solutions using eBPF instrumentation and AI driven insights on prometheus data, but we wanted to explore open source solutions to achieve the same. In this talk, we explain how we explored numalogic, an open source AIOps anomaly detection engine for Kubernetes. You will learn how to use numalogic on Prometheus metrics to derive baseline behaviors and detect anomalies, without any prior AI/ML experience. We will show how we collect, process and analyze in-cluster data in real time and how numalogic computes anomaly scores for each component, which bubbles up a single anomaly score for the cluster. There will be a live demo of the AIOps based prometheus metrics pipeline in action.
Authors: Vijay Samuel, Nick Pordash
2023-04-20

tldr - powered by Generative AI

The presentation discusses the use of profiling in DevOps to optimize code, reduce resource waste, and improve triage time.
  • Pyroscope UI allows for ad hoc profiling and comparison views
  • Profiles can help detect slow bleeds like memory leaks
  • Profiles can aid in root cause analysis and reduce time to triage
  • Profiles can help optimize code and reduce resource waste
Authors: Alolita Sharma, Matt Young
2023-04-19

tldr - powered by Generative AI

The conference presentation discusses the importance of standardizing observability data and bridging the gap from a correlation perspective to make it more efficient and transformable. The goal is to reduce developer toil and help end-users correlate their data across systems.
  • Observability data is difficult to analyze due to the large amounts of data emitted from cloud infrastructure, applications, and services
  • Querying should be thought of as querying as code
  • The primary problem is bridging the gap from a correlation perspective to make it more efficient and transformable
  • The goal is to reduce developer toil and help end-users correlate their data across systems
  • The work group will research, analyze, and make recommendations for future working groups or projects to implement a standard
Authors: Ron Vider
2023-04-19

tldr - powered by Generative AI

The presentation discusses the use of openTelemetry for application security and highlights the importance of using modern tools, collecting cloud-native information, utilizing open-source tools, and prioritizing observability to make applications more secure.
  • Modern problems require modern solutions, and application security testing tools need to evolve to keep up with changing vulnerabilities in modern applications.
  • Collecting all available cloud-native information, such as traces and infrastructure configuration, is crucial when addressing vulnerabilities in cloud-native applications.
  • Open-source tools, such as openTelemetry, can be repurposed for application security purposes to make organizations more secure.
  • Observability is essential for understanding the real risk of microservices-based and Kubernetes-based applications, and analyzing each microservice separately without knowledge of the surrounding infrastructure is insufficient.
Authors: Natalie Serrino, Frederic Branczyk
2023-04-19

tldr - powered by Generative AI

The presentation discusses the use of BPF (Berkeley Packet Filter) in cybersecurity and DevOps, highlighting its benefits and future potential.
  • BPF is a powerful tool for network analysis, security, and observability in production environments.
  • BPF allows for zero-instrumentation profiling of entire production clusters.
  • BPF has some limitations, including performance issues and difficulty in interpreting raw data.
  • Future developments in BPF may address these limitations, including increased support for programming languages and improved interpretability through machine learning.
Authors: Anurag Gupta, Eduardo Silva
2023-04-19

tldr - powered by Generative AI

Controlling data flow is crucial for cost reduction and efficient use of resources in logging and metrics management. Fluent Bit offers processors for modifying data and labels to optimize indexing and querying.
  • Companies generate 20-30% more logs each year, making control of data flow important for cost reduction and efficient resource use
  • Fluent Bit offers processors for modifying data and labels to optimize indexing and querying
  • Lua scripting can be used for log processing
  • Labels can be added, updated, or deleted using Fluent Bit processors
  • Fluent Bit can be used for metrics management and data scraping
Authors: Liz Rice, Richard Hartmann, Andy Allred
2023-04-19

tldr - powered by Generative AI

Cilium is a high-performance networking and security solution for Kubernetes that uses eBPF and is becoming the CNI of choice in the industry. The presentation covers updates, news, roadmap, and real-world use cases of Cilium.
  • Cilium is a popular networking and security solution for Kubernetes that uses eBPF and is becoming the CNI of choice in the industry.
  • Cilium provides high-performance load balancing, network policy, transparent encryption, and the ability to integrate multiple Kubernetes clusters and external workloads.
  • Hubble is the observability platform that gives visibility into individual network flows, aggregated metrics, service maps, and the ability to export all this metric information to various destinations.
  • Tetragon is the security observability subproject in Cilium that uses eBPF to instrument the kernel and give insight into security-relevant events.
  • Cilium is being adopted by all major cloud providers, including AWS, Azure, and Google Cloud.
  • The presentation includes real-world use cases of Cilium from Isovalent, Grafana Labs, and Eficode.
  • Grafana Labs has developed a new Grafana app that allows users to get all the power of Hubble directly from within Grafana.