logo

Metrics at Full Throttle: Intro and Deep Dive Into Thanos

2023-04-21

Authors:   Filip Petkovski, Saswata Mukherjee


Summary

Thanos is an open-source solution for scaling Prometheus-based monitoring by providing a distributed highly-available metric system with long-term retention. It addresses challenges with scaling functionality like querying metrics across large time ranges via downsampling and ingesting metrics at scale.
  • Prometheus is a standalone monitoring system that scrapes metrics from applications and stores them locally, but it cannot handle a large multi-environment setup or retain data for a long period of time
  • Thanos fills the gaps in Prometheus by providing a global view, long-term retention, downsampling, and multi-tenancy features
  • Thanos achieves a global view by using a standalone service called PromQL and defining the store API, which allows the queryer to request time series data from any component
  • Thanos also provides global alerting and rule recording through the Thanos ruler, which executes alerting rules across the entire data set
  • Thanos sidecar can be configured to upload data from Prometheus into object storage, making it easier to store data on disk for longer periods of time and move disks around
Prometheus cannot handle a large multi-environment setup, which means that it cannot provide a global view of the data. Thanos solves this problem by using a standalone service called PromQL and defining the store API, which allows the queryer to request time series data from any component. This allows the query to connect to multiple Prometheus instances, providing a global view of the data. With a global view, Thanos can also provide global alerting and rule recording through the Thanos ruler, which executes alerting rules across the entire data set.

Abstract

Thanos is one of the leading open-source solutions when it comes to scaling Prometheus-based monitoring. It is a single binary, that provides you with various components that allow composing it into a distributed highly-available metric system with long-term retention. Thanos already addresses several unique challenges with scaling functionality like querying metrics across large time ranges via downsampling and ingesting metrics at scale. Over the last few quarters, the Thanos community has been more active than ever. We have been heavily focused on interesting challenges around operational excellence, namely query execution performance, building a brand new query engine and quality of service advancements. In this talk, you will learn about the basics of Thanos and how to leverage its state-of-the-art features. The Thanos maintainers will share their insights by giving you a brief overview, and demonstrating how they are using the latest features in larger engineering contexts.

Materials: