logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Filip Petkovski, Saswata Mukherjee
2023-04-21

tldr - powered by Generative AI

Thanos is an open-source solution for scaling Prometheus-based monitoring by providing a distributed highly-available metric system with long-term retention. It addresses challenges with scaling functionality like querying metrics across large time ranges via downsampling and ingesting metrics at scale.
  • Prometheus is a standalone monitoring system that scrapes metrics from applications and stores them locally, but it cannot handle a large multi-environment setup or retain data for a long period of time
  • Thanos fills the gaps in Prometheus by providing a global view, long-term retention, downsampling, and multi-tenancy features
  • Thanos achieves a global view by using a standalone service called PromQL and defining the store API, which allows the queryer to request time series data from any component
  • Thanos also provides global alerting and rule recording through the Thanos ruler, which executes alerting rules across the entire data set
  • Thanos sidecar can be configured to upload data from Prometheus into object storage, making it easier to store data on disk for longer periods of time and move disks around
Authors: Kemal Akkoyun, Matej Gera
2023-04-20

Download the code ahead of time. DCO Required.Thanos - a CNCF project - is a Prometheus-compatible scalable system for high availability and long-term storage of the metrics. Thanos takes various Prometheus functionalities and splits them into microservices. It allows Thanos to scale different parts of the system based on usage. Although this makes it possible for Thanos to achieve its aims, the complexity of such a distributed system comes at a cost. Operating Thanos requires knowing what you're doing and being up to date with the continuous improvements the project goes through. And there is no better way to gain this insight than by putting your finger on the pulse of the project!ContribFest will allow participants and companies to explore how easy it is to contribute to and maintain Thanos with the community. During our session, you'll go through the code base with the maintainers, get familiar with the contribution cycles and learn about our testing framework and CI/CD setup. We plan to involve the audience with interactive tutorials they can follow and ask questions on their way to becoming contributors.This Contribfest session is designed to provide projects with the space and resources to tackle outstanding technical debt, security issues, or outstanding impactful feature requests. They are intended to provide a place for maintainers to meet contributors and potential contributors and work together on solving a problem.
Authors: Matthias Loibl, Nadine Vehling
2022-10-28

In this talk, Nadine and Matthias will give an introduction to Pyrra, a project that aims to make Service Level Objectives (SLOs) with Prometheus manageable, accessible, and easy to use for everyone. Nadine will talk about the project approach and findings for creating an easy-to-use observability tool. Matthias will then walk the audience through setting up a Pyrra instance on Kubernetes and how to connect it with either Prometheus or Thanos. After a successful deployment every component of the cluster will get an SLO, starting with etcd, the Kubernetes API server and kubelet, CoreDNS, and at the end Prometheus and Pyrra itself. In the end, a demo will showcase an outage in the cluster and what the alerting will look like, discussing how the lives of on-call engineers have been improved.
Authors: Bartłomiej Płotka, Kemal Akkoyun
2022-05-20

tldr - powered by Generative AI

The presentation discusses the challenges of troubleshooting complex systems and introduces a tool that automates correlation of alerts and provides useful links to relevant data sources.
  • Troubleshooting complex systems can be challenging, especially when dealing with multiple data sources and queries
  • The presenter introduces a tool that automates correlation of alerts and provides useful links to relevant data sources
  • The tool uses exemplars to trace the root cause of alerts and filter out relevant data
  • The tool is experimental and requires the latest releases of several projects
  • The presenter provides an anecdote of using the tool to troubleshoot a performance issue in a request
Authors: Filip Petkovski, Moad Zardab
2022-05-20

tldr - powered by Generative AI

The presentation discusses the implementation of vertical sharding in Thanos for efficient query processing and scaling.
  • Vertical sharding can be useful for Prometheus as well as Thanos.
  • Sharding is implemented end-to-end from query execution to data retrieval from the store.
  • Sharding queries across a fleet of Thanos queries allows for horizontal scaling.
  • There is a proposal in upstream Thanos for grouping and aggregation queries.
  • Contributing to Thanos is encouraged and the community is friendly.
Authors: Wiard van Rij
2021-10-14

tldr - powered by Generative AI

Thanos is a highly available, pluggable, long-term metric storage solution that extends Prometheus to enable scaling, highly available setups, and long-term storage for everyone. It also has multiple components that could be used for multi-cluster telemetry, remote writes, and multi-tenancy.
  • Prometheus is a popular project for short metric retention and visualization
  • Thanos extends Prometheus to enable scaling, highly available setups, and long-term storage
  • Thanos has multiple components for multi-cluster telemetry, remote writes, and multi-tenancy
  • Thanos is prometheus compatible and has unlimited retention
  • The four core components of Thanos are the sidecar, query component, store, and compact components