Presentations | Hack Dojo

Sort by:

When the Logs Just Don’t Cut It: Root-Causing Incidents Without Re-Deploying Prod

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Phillip Kuznetsov

2022-10-28

tldr - powered by Generative AI

The presentation discusses how to root-cause incidents without redeploying production using bpftrace, a tool that captures useful data without restarting pods. The speaker demonstrates how to work with bpftrace on Kubernetes and shares tips and tricks for using Pixie to deploy and collect data from bpftrace scripts.

The speaker presents a scenario where the front-end service of an e-commerce company is panicking and the root cause is unknown
The speaker explains the need for a sum function to add money values in the service and the difficulty in identifying invalid money values
The speaker introduces bpftrace as a tool to capture useful data without redeploying pods
The speaker shares tips and tricks for working with bpftrace on Kubernetes, including using Pixie to deploy and collect data from bpftrace scripts

Tags:

Show 0 Comments

Customer Centric Observability: How Intuit Reduced Time To Detect Customer Impact From 30+ Minutes To Under 3 Minutes.

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Vinit Samel, Nagaraja Tantry

2022-10-27

While there was considerable Observability coverage across Intuit’s backend services, detecting, quantifying and isolating customer impact had been a challenge. Oftentimes, service degradation was detected but teams were unable to quickly assess customer impact and map issues back to end user experience. Over the past year, Intuit leveraged OpenTelemetry to build a new capability called ‘Failed Customer Interactions (FCIs)’ that automatically detects, quantifies and isolates customer impact. The team built a simple to use abstraction that reduces the complexity of distributed tracing while leveraging its benefits. We will cover how the RUM-FCI capability powered by OpenTelemetry can not only reduce the time to detect incidents, but also isolate root cause at speed, with an explicit focus on customer impact. With this solution Intuit established a blueprint to reduce its MTTD from over 30 minutes to less than 3 minutes.

Tags:

Show 0 Comments

Fluent Bit V2.0: Unifying Open Standards For Logs, Metrics & Traces

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Anurag Gupta, Eduardo Silva

2022-10-26

Fluent Bit is the next-generation tool to deliver a unified layer for Logs, Metrics, and Traces. In this session, Fluent maintainers will do a 101 intro to the observability space and also will do a deep dive into the new features available in Fluent Bit v2.0 . Attendees will benefit from this session by learning different techniques for observability associated with Fluent Bit, Prometheus, and OpenTelemetry, as well as a couple of tips and best practices that are a must when deploying observability tools in production.

Tags:

Show 0 Comments

Jaeger: The Future with OpenTelemetry and Metrics

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Joe Elliott, Jonah Kowall

2022-10-26

tldr - powered by Generative AI

The presentation discusses the introduction of new signals, metrics, in addition to traces in Jaeger, a distributed tracing system, to understand performance and service monitoring. The auto-instrumentation feature in .NET eliminates the need for code changes to instrument the application.

Jaeger is a distributed tracing system that helps debug and understand transactions
Metrics are introduced as new signals in addition to traces to understand performance and service monitoring
The span map metrics processor is used to derive metrics from traces and generate metrics from these traces
Metrics can be visualized in Grafana and used in the Jaeger UI
Auto-instrumentation feature in .NET eliminates the need for code changes to instrument the application

Tags:

Show 0 Comments

Why, How to, and Issues: Tail-Based Sampling in the OpenTelemetry Collector

Conference: KubeCon + CloudNativeCon Europe 2022

Authors: Reese Lee

2022-05-19

tldr - powered by Generative AI

The presentation discusses the importance of distributed tracing and the use of sampling strategies to manage the volume of data produced. It also highlights the challenges of implementing tail-based sampling using OpenTelemetry.

Distributed tracing is important for understanding system connections and diagnosing problems.
Traces are made up of spans, which represent logical units of work within a request.
Sampling can be implemented at different stages of span processing to reduce the number of created or sampled spans.
Tail-based sampling can be optimal for efficiently getting the desired data, but it can also have performance and scalability concerns.
OpenTelemetry requires a collector to implement tail-based sampling, and all traces need to end up in the same collector for it to work properly.

Tags:

Show 0 Comments

Native Instrumentation for Open Source Software with OpenTelemetry

Conference: KubeCon + CloudNativeCon North America 2021

Authors: Ted Young, Liudmila Molkova

2021-10-15

tldr - powered by Generative AI

The presentation discusses the importance of instrumentation and semantic conventions in distributed tracing for libraries and applications using OpenTelemetry SDK.

Instrumentation should be opt-in initially and mature over time with user feedback
Performance impact should be considered and users should be mindful of costs
Semantic conventions are critical for user experience and should be followed
Context propagation is essential for distributed tracing and should be implemented in libraries and applications
OpenTelemetry SDK provides solutions for instrumentation, semantic conventions, and context propagation

Tags:

Show 0 Comments