Correlating Signals in Opentelemetry: Benefits, Stories, and the Road Ahead

Conference: KubeCon + CloudNativeCon North America 2021

2021-10-14

Authors: Morgan McLean, Jaana Dogan

Summary

OpenTelemetry provides correlations between different types of data that can be used to improve service operations and responses to outages.

OpenTelemetry captures distributed traces, metrics, logs, and resource metadata
Correlating this information is crucial for understanding failures in highly distributed systems
OpenTelemetry allows for correlations between language runtime traces and network events
Correlations can provide general production insights and improve development velocity

In the example of a mock e-commerce service, correlations between distributed traces and service information were used to identify the checkout service as the source of extremely high latency, which could lead to lost customers and decreased faith in the e-commerce system.

Abstract

OpenTelemetry is used across the industry to capture distributed traces, however this is just a sliver of the value that the project provides. OpenTelemetry also gathers metrics (launched earlier this year) and logs (beta) from your applications and infrastructure, allowing you to capture all telemetry through a single pipeline and powerful analysis in whatever tools you choose! In this session we will discuss: - How OpenTelemetry correlates these signals, which allows your investigations to jump flow seamlessly between all of your services and underlying infrastructure - The deep functionality that OpenTelemetry provides for metrics and logs, including metric formats and aggregations, tailing logs from flat files, and the a high-performance strongly-typed logging pipeline for new applications - Real stories about how large well-known organizations use OpenTelemetry and the improvements that they’ve gained - What’s next for OpenTelemetry: new data sources, signals, and more

Materials:

Tags: