Presentations | Hack Dojo

Sort by:

Effortless Open Source Observability with Cilium, Prometheus and Grafana - LGTM!

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Raymond de Jong, Anna Kapuścińska

2023-04-21

tldr - powered by Generative AI

The presentation discusses the challenges of observability and security in distributed systems and how psyllium and Hubble can address these challenges.

Psyllium and Hubble can provide observability and security in distributed systems
Existing mechanisms such as traditional monitoring devices and VPC logs fall short in providing context and scalability
Psyllium uses identity-based observability and security based on labels to secure and monitor traffic
Hubble provides a surface mesh solution for monitoring workflows and exporting flows to other platforms
Ready-to-use dashboards are available in Grafana marketplace for monitoring cluster and application performance

Tags:

Show 0 Comments

Metrics at Full Throttle: Intro and Deep Dive Into Thanos

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Filip Petkovski, Saswata Mukherjee

2023-04-21

tldr - powered by Generative AI

Thanos is an open-source solution for scaling Prometheus-based monitoring by providing a distributed highly-available metric system with long-term retention. It addresses challenges with scaling functionality like querying metrics across large time ranges via downsampling and ingesting metrics at scale.

Prometheus is a standalone monitoring system that scrapes metrics from applications and stores them locally, but it cannot handle a large multi-environment setup or retain data for a long period of time
Thanos fills the gaps in Prometheus by providing a global view, long-term retention, downsampling, and multi-tenancy features
Thanos achieves a global view by using a standalone service called PromQL and defining the store API, which allows the queryer to request time series data from any component
Thanos also provides global alerting and rule recording through the Thanos ruler, which executes alerting rules across the entire data set
Thanos sidecar can be configured to upload data from Prometheus into object storage, making it easier to store data on disk for longer periods of time and move disks around

Tags:

Show 0 Comments

Cortex: How to Run a Rock Solid Multi-Tenant Prometheus

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Friedrich Gonzalez, Alan Protasio

2023-04-21

tldr - powered by Generative AI

The presentation discusses the reliability and features of Cortex, a project based on Prometheus and designed for Kubernetes.

Cortex is designed for Kubernetes and is not a separate project from Prometheus
Cortex uses Thanos for reliability and provides limits to ensure reliability
Cortex implements vulnerable replication to ensure data is replicated across instances
Cortex has upcoming projects such as Gateway, Down Sampling, Federated Rules, and Native Histogram
There are plans to improve observability on the Cortex layer for cardinality

Tags:

Show 0 Comments

🚨 ContribFest: Prometheus - Hands-on Development and Contribution Workshop (Limited Availability; First-Come, First

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Bartłomiej Płotka, Gouthan Veeramachaneni

2023-04-21

Download the code ahead of time. DCO Required.Wished you know how to write exporter in Go for Prometheus? How to use Prometheus APIs programmatically? Need to quickly instrument you Go code with Prometheus metrics? Join us to learn how to contribute, develop and test Prometheus integrations useful in day to day use. Unblock yourself and others! It's easier than you think!We will go through useful resources and ways to interact with the project and community, to create meaningful applications that use Prometheus effectively!Note: For interactive experience, please make sure to bring your laptop, your favourite IDE and Golang installed.This Contribfest session is designed to provide projects with the space and resources to tackle outstanding technical debt, security issues, or outstanding impactful feature requests. They are intended to provide a place for maintainers to meet contributors and potential contributors and work together on solving a problem.

Tags:

Show 0 Comments

Show Me the Metrics: How a Huge Bank Does Observability with Multi-Tenancy Prometheus and Thanos

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Rodrigo Serra Inacio, Willian Saavedra Moreira Costa

2023-04-21

tldr - powered by Generative AI

Cloud Metrics is a scalable and resilient platform for monitoring both systems and environments of a bank. The key to building this platform was isolation and reducing noise between tenants. The main components used were Kubernetes, Prometheus, Grafana, and Alert Manager. The infrastructure was built using EKS and hosted in Sao Paulo, Brazil. Users access their metrics through Graphene and Prometheus images. Each tenant has their own account and bucket to store their metrics.

Cloud Metrics is a platform for monitoring both systems and environments of a bank
Isolation and reducing noise between tenants was key to building the platform
Main components used were Kubernetes, Prometheus, Grafana, and Alert Manager
Infrastructure was built using EKS and hosted in Sao Paulo, Brazil
Users access their metrics through Graphene and Prometheus images
Each tenant has their own account and bucket to store their metrics

Tags:

Show 0 Comments

🚨 ContribFest: Thanos - Hands-on Development and Contribution Workshop (Limited Availability; First-Come, First

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Kemal Akkoyun, Matej Gera

2023-04-20

Download the code ahead of time. DCO Required.Thanos - a CNCF project - is a Prometheus-compatible scalable system for high availability and long-term storage of the metrics. Thanos takes various Prometheus functionalities and splits them into microservices. It allows Thanos to scale different parts of the system based on usage. Although this makes it possible for Thanos to achieve its aims, the complexity of such a distributed system comes at a cost. Operating Thanos requires knowing what you're doing and being up to date with the continuous improvements the project goes through. And there is no better way to gain this insight than by putting your finger on the pulse of the project!ContribFest will allow participants and companies to explore how easy it is to contribute to and maintain Thanos with the community. During our session, you'll go through the code base with the maintainers, get familiar with the contribution cycles and learn about our testing framework and CI/CD setup. We plan to involve the audience with interactive tutorials they can follow and ask questions on their way to becoming contributors.This Contribfest session is designed to provide projects with the space and resources to tackle outstanding technical debt, security issues, or outstanding impactful feature requests. They are intended to provide a place for maintainers to meet contributors and potential contributors and work together on solving a problem.

Tags:

Show 0 Comments

Prometheus Updates and Deep Dive

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Kemal Akkoyun, Bryan Boreham

2023-04-19

As the 2nd oldest project in the CNCF, you have probably heard about Prometheus before. Prometheus is the de facto standard in cloud-native metrics monitoring and beyond, mainly because Kubernetes is designing its custom metrics engine for Prometheus. Nevertheless, the project maintainers will introduce you from the very beginning, followed by a deep dive into its internal and a list of the exciting new features that have been released recently or are in the pipeline. You will learn about many opportunities to use Prometheus, and we will cover a mix of introduction content, a deeper dive into current developments, and open Q&A at the end. We can even tempt you to contribute to the project yourself.

Tags:

Show 0 Comments

Multi-Cluster Observability with Service Mesh - That Is a Lot of Moving Parts!?

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Ryota Sawada

2023-04-19

tldr - powered by Generative AI

The presentation discusses multi-cluster observability and the challenges involved in managing metrics and data retention across multiple clusters.

Cardinality and data retention are important aspects to consider in multi-cluster observability
Metrics can be fetched from running services like Prometheus, but data retention costs can add up quickly
Differentiating between clusters and applications is important for effective dashboarding
The presentation focuses on Istio, Prometheus, and Thanos as key projects for multi-cluster observability
The demo showcases the installation process for Istio and the creation of certificates for secure communication between clusters

Tags:

Show 0 Comments

DoorDash’s Journey From StatsD To Prometheus With 10 Million Metrics/Second

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Benjamin Raskin, Emma Wang

2022-10-28

tldr - powered by Generative AI

The presentation discusses the migration of infrastructure and application metrics from Stacy to Prometheus at DoorDash, and the challenges and learnings encountered during the process.

The migration involved over 130 services, 1500 dashboards, and more than 7000 alerts.
The use of histograms instead of percentiles was a difficult change for engineers to adapt to.
The instance label is a high cardinality label that needs to be pre-aggregated to reduce volume.
PromptCare's aggregation gateway was used for some metrics, but push models were limited to special cases.
Automating the monitoring onboarding process for teams is crucial.
The migration was completed in one year, resulting in over 27,000 alerts and 2200 dashboards.
Post-migration, DoorDash ingests over 15 million metrics per second and persists over 10 million metrics per second.

Tags:

Show 0 Comments

SLO-Based Observability For All Kubernetes Cluster Components

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Matthias Loibl, Nadine Vehling

2022-10-28

In this talk, Nadine and Matthias will give an introduction to Pyrra, a project that aims to make Service Level Objectives (SLOs) with Prometheus manageable, accessible, and easy to use for everyone. Nadine will talk about the project approach and findings for creating an easy-to-use observability tool. Matthias will then walk the audience through setting up a Pyrra instance on Kubernetes and how to connect it with either Prometheus or Thanos. After a successful deployment every component of the cluster will get an SLO, starting with etcd, the Kubernetes API server and kubelet, CoreDNS, and at the end Prometheus and Pyrra itself. In the end, a demo will showcase an outage in the cluster and what the alerting will look like, discussing how the lives of on-call engineers have been improved.

Tags:

Show 0 Comments