Why, How to, and Issues: Tail-Based Sampling in the OpenTelemetry Collector

Conference: KubeCon + CloudNativeCon Europe 2022

2022-05-19

Authors: Reese Lee

Summary

The presentation discusses the importance of distributed tracing and the use of sampling strategies to manage the volume of data produced. It also highlights the challenges of implementing tail-based sampling using OpenTelemetry.

Distributed tracing is important for understanding system connections and diagnosing problems.
Traces are made up of spans, which represent logical units of work within a request.
Sampling can be implemented at different stages of span processing to reduce the number of created or sampled spans.
Tail-based sampling can be optimal for efficiently getting the desired data, but it can also have performance and scalability concerns.
OpenTelemetry requires a collector to implement tail-based sampling, and all traces need to end up in the same collector for it to work properly.

The speaker used a meme of a person with a firehose of spans to illustrate the negative impact of not having a sampling strategy in place. They also demonstrated the use of tail-based sampling using OpenTelemetry and showed how it can be useful for getting the desired data efficiently.

Abstract

When you are running OpenTelemetry in production and your services are producing a firehouse of spans, the traditional and default head-based sampling approach won’t cut it. This is because traces are sampled at initiation, which can be useful for some environments, but for larger systems, it can mean you miss out on key trace data. This is where configuring the Collector to sample your traces after they have fully completed–tail-based sampling–becomes a great option. In this talk, you’ll learn about head- and tail-based sampling, and why the latter approach is useful for obtaining the highest level of granularity in troubleshooting. You’ll learn how to configure your OpenTelemetry Collector to do this, and see the implementation in a suite of microservices, with traces exported to Jaeger. You’ll also learn of the current issues with implementing tail-based sampling in the OpenTelemetry Collector in production so you can take the challenges into account for your own deployments.Click here to view captioning/translation in the MeetingPlay platform!

Materials:

Tags: