Stateless Collectors For Stateful Data: Scaling Prometheus As a Node Agent

Conference: KubeCon + CloudNativeCon North America 2022

2022-10-28

Authors: Danny Clark

Summary

The presentation discusses the challenges of scaling Prometheus and offers a solution through a managed service that leverages Prometheus as a node agent.

Scaling Prometheus can be challenging due to issues with data aggregation and network failures
Existing solutions such as Federation, remote read, and Thanos require manual maintenance and expertise
A managed service that leverages Prometheus as a node agent can mitigate scaling issues and separate state and query concerns
The service forwards metrics data to a remote back end and leverages Kubernetes resource and Daemon set to achieve the setup
Google's Monarch provides the capacity needed to offer a prom ql compatible API and long-term retention of metrics

The speaker mentions a customer who found success through adopting Thanos, one of the existing solutions for scaling Prometheus.

Abstract

prometheus-operator is the de facto standard for running Prometheus on Kubernetes. Yet, its configuration can be complicated and baroque, making it hard to know what is being scraped, or to properly enforce RBAC. Scaling also requires careful thought. However, there are an increasing number of ways to run Prometheus as “stateless”. How can we adopt this to solve these problems? This talk introduces an alternative, operator-based approach for running stateless Prometheus instances on Kubernetes by leveraging Prometheus as a node agent. This prompted rethinking how Prometheus configuration is done today, and led to new, simpler, and more opinionated CRDs. We will discuss trade-offs in the new configuration model and the challenges of running a fleet of node-agent Prometheuses at scale. The hope is this lowers the barrier to entry of managing Prometheus infrastructure, while still supporting features and access controls for enterprise users.

Materials:

Tags:

Stateless Collectors For Stateful Data: Scaling Prometheus As a Node Agent

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Danny Clark

Summary

Abstract

Post a comment

Related work