Effortless Open Source Observability with Cilium, Prometheus and Grafana - LGTM!

Conference: KubeCon + CloudNativeCon Europe 2023

2023-04-21

Authors: Raymond de Jong, Anna Kapuścińska

Summary

The presentation discusses the challenges of observability and security in distributed systems and how psyllium and Hubble can address these challenges.

Psyllium and Hubble can provide observability and security in distributed systems
Existing mechanisms such as traditional monitoring devices and VPC logs fall short in providing context and scalability
Psyllium uses identity-based observability and security based on labels to secure and monitor traffic
Hubble provides a surface mesh solution for monitoring workflows and exporting flows to other platforms
Ready-to-use dashboards are available in Grafana marketplace for monitoring cluster and application performance

The presentation uses the example of a microservices demo application instrumented with open telemetry tracing to illustrate how psyllium and Hubble work in a cluster. Psyllium's stadium service mesh feature provides routing for the Ingress and Envoy configs are generated automatically. Hubble parses trace headers and includes them in metrics to monitor application performance. Psyllium's identity-based observability and security uses labels to identify unique sets of data and attach them to the data plane for monitoring and securing traffic.

Abstract

Updating applications to include observability can be almost impossible, but the other option is not know if your application is even working. Enter Cilium which leverages eBPF to provide observability data with Prometheus metrics for your applications without having to modify the application itself. In this session we will explain how Cilium powered with Hubble and the Grafana LGTM stack is able to show Service to Service communication, monitor Golden Signals, detect transient network layer issues and identifies problematic API request with transparent tracing. Using a demo application we will demonstrate performance and metrics for that application and how the metrics change with increasing request volumes. We will show how metrics change when a new configuration of our application introduces error rates and increases request duration. Finally, we will show how tracing headers for the application can be exported with Hubble HTTP metrics as Exemplars to link metrics to traces in Grafana, monitoring each request and its duration using Tempo. The audience will walk away with knowledge on how to monitor service connectivity and collect tracing data and golden metrics using standard Prometheus, Grafana, and OpenTelemetry exported from Cilium and eBPF.

Materials:

Tags: