logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Michelle Nguyen, Hannah Troisi, Clemens Kolbitsch, Vihang Mehta
2023-04-21

tldr - powered by Generative AI

The conference presentation discusses the practicality of managing storage for multiple integrated applications in a busy environment, with a focus on the use of open telemetry and Pixie solutions.
  • The speaker addresses a question from an audience member regarding the storage of multiple integrated applications in a busy environment
  • The speaker explains that open telemetry and Pixie solutions can be used to capture and store data locally, and then filter and batch the data as needed
  • The speaker emphasizes the importance of considering sample strategies, filtering, and batching when designing an architecture for managing storage in a busy environment
Authors: Bartłomiej Płotka, Gouthan Veeramachaneni
2023-04-21

Download the code ahead of time. DCO Required.Wished you know how to write exporter in Go for Prometheus? How to use Prometheus APIs programmatically? Need to quickly instrument you Go code with Prometheus metrics? Join us to learn how to contribute, develop and test Prometheus integrations useful in day to day use. Unblock yourself and others! It's easier than you think!We will go through useful resources and ways to interact with the project and community, to create meaningful applications that use Prometheus effectively!Note: For interactive experience, please make sure to bring your laptop, your favourite IDE and Golang installed.This Contribfest session is designed to provide projects with the space and resources to tackle outstanding technical debt, security issues, or outstanding impactful feature requests. They are intended to provide a place for maintainers to meet contributors and potential contributors and work together on solving a problem.
Authors: Venkata Gunapati, Anusha Ragunathan
2023-04-21

As Platform Engineers & SREs, we love metrics from Kubernetes clusters to understand Platform Health. However, we dislike drowning in alerts on every metric & experiencing alert fatigue. The worst consequence of alert fatigue is not just on-call engineer burn out, but on-call snoozing alerts that could prevent incidents. At Intuit, we needed a smarter way to get alerted on a cluster’s Golden Signals, which are picked from an ocean of metrics. This would help reduce the MTTD during incidents. We wanted to achieve this without the burden of instrumenting cluster components. Observability vendors provide solutions using eBPF instrumentation and AI driven insights on prometheus data, but we wanted to explore open source solutions to achieve the same. In this talk, we explain how we explored numalogic, an open source AIOps anomaly detection engine for Kubernetes. You will learn how to use numalogic on Prometheus metrics to derive baseline behaviors and detect anomalies, without any prior AI/ML experience. We will show how we collect, process and analyze in-cluster data in real time and how numalogic computes anomaly scores for each component, which bubbles up a single anomaly score for the cluster. There will be a live demo of the AIOps based prometheus metrics pipeline in action.
Authors: Ted Young, Alolita Sharma, Morgan McLean, Daniel Dyla
2023-04-20

tldr - powered by Generative AI

Open Telemetry Integrations and Compatibility
  • Open Telemetry has several streaming protocols and projects baked into the project itself
  • Open Telemetry is interoperable with other projects and teams adding support for it
  • Native integrations are starting to use native OTLP APIs
  • Contrib repos have hundreds if not thousands of integrations with existing technologies
Authors: Kemal Akkoyun, Bryan Boreham
2023-04-19

As the 2nd oldest project in the CNCF, you have probably heard about Prometheus before. Prometheus is the de facto standard in cloud-native metrics monitoring and beyond, mainly because Kubernetes is designing its custom metrics engine for Prometheus. Nevertheless, the project maintainers will introduce you from the very beginning, followed by a deep dive into its internal and a list of the exciting new features that have been released recently or are in the pipeline. You will learn about many opportunities to use Prometheus, and we will cover a mix of introduction content, a deeper dive into current developments, and open Q&A at the end. We can even tempt you to contribute to the project yourself.
Authors: Sophia Vargas
2023-04-19

tldr - powered by Generative AI

The presentation discusses burnout in open source projects and provides recommendations to reduce it.
  • Burnout is a common problem in open source projects
  • Factors that contribute to burnout include losing patience, being always available, and losing interest
  • To reduce burnout, it is recommended to increase variety, delegate tasks, and provide clear milestones
  • Communication and building relationships are important in reducing burnout
  • Boundaries are vital to maintaining personal health and preventing burnout
Authors: Anurag Gupta, Eduardo Silva
2023-04-19

tldr - powered by Generative AI

Controlling data flow is crucial for cost reduction and efficient use of resources in logging and metrics management. Fluent Bit offers processors for modifying data and labels to optimize indexing and querying.
  • Companies generate 20-30% more logs each year, making control of data flow important for cost reduction and efficient resource use
  • Fluent Bit offers processors for modifying data and labels to optimize indexing and querying
  • Lua scripting can be used for log processing
  • Labels can be added, updated, or deleted using Fluent Bit processors
  • Fluent Bit can be used for metrics management and data scraping
Authors: Simon Pasquier, Vanessa Martini
2023-04-19

tldr - powered by Generative AI

The presentation discusses the challenges faced by site reliability engineers when troubleshooting issues in Kubernetes and introduces korrel8, an open source tool that aims to reduce the cognitive load of engineers when attempting to debug issues through the correlation of observability signals.
  • Observability signals are crucial for site reliability engineers to troubleshoot issues in Kubernetes
  • There is a lack of established open source tools that aggregate all the different observability signals and help users understand how their systems behave
  • Korrel8 is an open source project founded within Reddit that aims to make correlation across observability signals accessible to everyone
  • Korrel8 can reduce the cognitive load of engineers when attempting to debug issues
  • The presentation includes a demo of korrel8 and a sneak peek overview of the roadmap vision and next steps
Authors: Reese Lee
2023-04-19

tldr - powered by Generative AI

The presentation covers the basics of metrics and Open Telemetry, including the architecture of a metrics pipeline, metric instruments, and their use cases.
  • Metrics and Open Telemetry are used for observability and provide an API and SDK for instrumenting code and collecting telemetry data.
  • The media provider is the API entry point for metrics, and meters and instruments are used to record measurements.
  • Aggregation, temporality, and dimensions are important concepts in metrics.
  • Async up down counters and gauges are two types of metric instruments that are used for different purposes.
  • There is much more to learn about metrics and Open Telemetry, including customization options and different processors for transforming metrics data.
  • The presentation provides references for further exploration and credits to the people who contributed to the content.
Authors: Benjamin Raskin, Emma Wang
2022-10-28

tldr - powered by Generative AI

The presentation discusses the migration of infrastructure and application metrics from Stacy to Prometheus at DoorDash, and the challenges and learnings encountered during the process.
  • The migration involved over 130 services, 1500 dashboards, and more than 7000 alerts.
  • The use of histograms instead of percentiles was a difficult change for engineers to adapt to.
  • The instance label is a high cardinality label that needs to be pre-aggregated to reduce volume.
  • PromptCare's aggregation gateway was used for some metrics, but push models were limited to special cases.
  • Automating the monitoring onboarding process for teams is crucial.
  • The migration was completed in one year, resulting in over 27,000 alerts and 2200 dashboards.
  • Post-migration, DoorDash ingests over 15 million metrics per second and persists over 10 million metrics per second.