logo

Show Me the Metrics: How a Huge Bank Does Observability with Multi-Tenancy Prometheus and Thanos

2023-04-21

Authors:   Rodrigo Serra Inacio, Willian Saavedra Moreira Costa


Summary

Cloud Metrics is a scalable and resilient platform for monitoring both systems and environments of a bank. The key to building this platform was isolation and reducing noise between tenants. The main components used were Kubernetes, Prometheus, Grafana, and Alert Manager. The infrastructure was built using EKS and hosted in Sao Paulo, Brazil. Users access their metrics through Graphene and Prometheus images. Each tenant has their own account and bucket to store their metrics.
  • Cloud Metrics is a platform for monitoring both systems and environments of a bank
  • Isolation and reducing noise between tenants was key to building the platform
  • Main components used were Kubernetes, Prometheus, Grafana, and Alert Manager
  • Infrastructure was built using EKS and hosted in Sao Paulo, Brazil
  • Users access their metrics through Graphene and Prometheus images
  • Each tenant has their own account and bucket to store their metrics
The bank needed a scalable and resilient platform for monitoring both their on-premise environment and their cloud environment. Cloud Metrics was built to meet this need by isolating components and reducing noise between tenants. Users access their metrics through Graphene and Prometheus images, and each tenant has their own account and bucket to store their metrics.

Abstract

Let's talk and share how the bank has been adopting a fully open-source multi-tenancy platform in high availability with Prometheus, Kubernetes and Thanos. Dubbed as 'CloudMetrics', the new metrics platform has been democratizing the adoption of observability by our teams and expanding the number of open-source solutions within the largest financial institution in Latin America. Let's walk through this case showing its implementation, architecture and some positive and negative insights.

Materials: