logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Phillip Kuznetsov
2022-10-28

tldr - powered by Generative AI

The presentation discusses how to root-cause incidents without redeploying production using bpftrace, a tool that captures useful data without restarting pods. The speaker demonstrates how to work with bpftrace on Kubernetes and shares tips and tricks for using Pixie to deploy and collect data from bpftrace scripts.
  • The speaker presents a scenario where the front-end service of an e-commerce company is panicking and the root cause is unknown
  • The speaker explains the need for a sum function to add money values in the service and the difficulty in identifying invalid money values
  • The speaker introduces bpftrace as a tool to capture useful data without redeploying pods
  • The speaker shares tips and tricks for working with bpftrace on Kubernetes, including using Pixie to deploy and collect data from bpftrace scripts
Authors: Vinit Samel, Nagaraja Tantry
2022-10-27

While there was considerable Observability coverage across Intuit’s backend services, detecting, quantifying and isolating customer impact had been a challenge. Oftentimes, service degradation was detected but teams were unable to quickly assess customer impact and map issues back to end user experience. Over the past year, Intuit leveraged OpenTelemetry to build a new capability called ‘Failed Customer Interactions (FCIs)’ that automatically detects, quantifies and isolates customer impact. The team built a simple to use abstraction that reduces the complexity of distributed tracing while leveraging its benefits. We will cover how the RUM-FCI capability powered by OpenTelemetry can not only reduce the time to detect incidents, but also isolate root cause at speed, with an explicit focus on customer impact. With this solution Intuit established a blueprint to reduce its MTTD from over 30 minutes to less than 3 minutes.
Authors: Anurag Gupta, Eduardo Silva
2022-10-26

Fluent Bit is the next-generation tool to deliver a unified layer for Logs, Metrics, and Traces. In this session, Fluent maintainers will do a 101 intro to the observability space and also will do a deep dive into the new features available in Fluent Bit v2.0 . Attendees will benefit from this session by learning different techniques for observability associated with Fluent Bit, Prometheus, and OpenTelemetry, as well as a couple of tips and best practices that are a must when deploying observability tools in production.
Authors: Joe Elliott, Jonah Kowall
2022-10-26

tldr - powered by Generative AI

The presentation discusses the introduction of new signals, metrics, in addition to traces in Jaeger, a distributed tracing system, to understand performance and service monitoring. The auto-instrumentation feature in .NET eliminates the need for code changes to instrument the application.
  • Jaeger is a distributed tracing system that helps debug and understand transactions
  • Metrics are introduced as new signals in addition to traces to understand performance and service monitoring
  • The span map metrics processor is used to derive metrics from traces and generate metrics from these traces
  • Metrics can be visualized in Grafana and used in the Jaeger UI
  • Auto-instrumentation feature in .NET eliminates the need for code changes to instrument the application
Authors: Reese Lee
2022-05-19

tldr - powered by Generative AI

The presentation discusses the importance of distributed tracing and the use of sampling strategies to manage the volume of data produced. It also highlights the challenges of implementing tail-based sampling using OpenTelemetry.
  • Distributed tracing is important for understanding system connections and diagnosing problems.
  • Traces are made up of spans, which represent logical units of work within a request.
  • Sampling can be implemented at different stages of span processing to reduce the number of created or sampled spans.
  • Tail-based sampling can be optimal for efficiently getting the desired data, but it can also have performance and scalability concerns.
  • OpenTelemetry requires a collector to implement tail-based sampling, and all traces need to end up in the same collector for it to work properly.
Authors: Ted Young, Liudmila Molkova
2021-10-15

tldr - powered by Generative AI

The presentation discusses the importance of instrumentation and semantic conventions in distributed tracing for libraries and applications using OpenTelemetry SDK.
  • Instrumentation should be opt-in initially and mature over time with user feedback
  • Performance impact should be considered and users should be mindful of costs
  • Semantic conventions are critical for user experience and should be followed
  • Context propagation is essential for distributed tracing and should be implemented in libraries and applications
  • OpenTelemetry SDK provides solutions for instrumentation, semantic conventions, and context propagation