Presentations | Hack Dojo

Sort by:

Tutorial: Building an Open Source Observability Stack

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Michelle Nguyen, Hannah Troisi, Clemens Kolbitsch, Vihang Mehta

2023-04-21

tldr - powered by Generative AI

The conference presentation discusses the practicality of managing storage for multiple integrated applications in a busy environment, with a focus on the use of open telemetry and Pixie solutions.

The speaker addresses a question from an audience member regarding the storage of multiple integrated applications in a busy environment
The speaker explains that open telemetry and Pixie solutions can be used to capture and store data locally, and then filter and batch the data as needed
The speaker emphasizes the importance of considering sample strategies, filtering, and batching when designing an architecture for managing storage in a busy environment

Tags:

Show 0 Comments

🚨 ContribFest: Prometheus - Hands-on Development and Contribution Workshop (Limited Availability; First-Come, First

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Bartłomiej Płotka, Gouthan Veeramachaneni

2023-04-21

Download the code ahead of time. DCO Required.Wished you know how to write exporter in Go for Prometheus? How to use Prometheus APIs programmatically? Need to quickly instrument you Go code with Prometheus metrics? Join us to learn how to contribute, develop and test Prometheus integrations useful in day to day use. Unblock yourself and others! It's easier than you think!We will go through useful resources and ways to interact with the project and community, to create meaningful applications that use Prometheus effectively!Note: For interactive experience, please make sure to bring your laptop, your favourite IDE and Golang installed.This Contribfest session is designed to provide projects with the space and resources to tackle outstanding technical debt, security issues, or outstanding impactful feature requests. They are intended to provide a place for maintainers to meet contributors and potential contributors and work together on solving a problem.

Tags:

Show 0 Comments

Smarter Golden Signals!

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Venkata Gunapati, Anusha Ragunathan

2023-04-21

As Platform Engineers & SREs, we love metrics from Kubernetes clusters to understand Platform Health. However, we dislike drowning in alerts on every metric & experiencing alert fatigue. The worst consequence of alert fatigue is not just on-call engineer burn out, but on-call snoozing alerts that could prevent incidents. At Intuit, we needed a smarter way to get alerted on a cluster’s Golden Signals, which are picked from an ocean of metrics. This would help reduce the MTTD during incidents. We wanted to achieve this without the burden of instrumenting cluster components. Observability vendors provide solutions using eBPF instrumentation and AI driven insights on prometheus data, but we wanted to explore open source solutions to achieve the same. In this talk, we explain how we explored numalogic, an open source AIOps anomaly detection engine for Kubernetes. You will learn how to use numalogic on Prometheus metrics to derive baseline behaviors and detect anomalies, without any prior AI/ML experience. We will show how we collect, process and analyze in-cluster data in real time and how numalogic computes anomaly scores for each component, which bubbles up a single anomaly score for the cluster. There will be a live demo of the AIOps based prometheus metrics pipeline in action.

Tags:

Show 0 Comments

OpenTelemetry: Using Unified Semantics to Drive Insights + Project Update

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Ted Young, Alolita Sharma, Morgan McLean, Daniel Dyla

2023-04-20

tldr - powered by Generative AI

Open Telemetry Integrations and Compatibility

Open Telemetry has several streaming protocols and projects baked into the project itself
Open Telemetry is interoperable with other projects and teams adding support for it
Native integrations are starting to use native OTLP APIs
Contrib repos have hundreds if not thousands of integrations with existing technologies

Tags:

Show 0 Comments

Prometheus Updates and Deep Dive

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Kemal Akkoyun, Bryan Boreham

2023-04-19

As the 2nd oldest project in the CNCF, you have probably heard about Prometheus before. Prometheus is the de facto standard in cloud-native metrics monitoring and beyond, mainly because Kubernetes is designing its custom metrics engine for Prometheus. Nevertheless, the project maintainers will introduce you from the very beginning, followed by a deep dive into its internal and a list of the exciting new features that have been released recently or are in the pipeline. You will learn about many opportunities to use Prometheus, and we will cover a mix of introduction content, a deeper dive into current developments, and open Q&A at the end. We can even tempt you to contribute to the project yourself.

Tags:

Show 0 Comments

Combat Maintainer Burnout with Proactive Metrics

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Sophia Vargas

2023-04-19

tldr - powered by Generative AI

The presentation discusses burnout in open source projects and provides recommendations to reduce it.

Burnout is a common problem in open source projects
Factors that contribute to burnout include losing patience, being always available, and losing interest
To reduce burnout, it is recommended to increase variety, delegate tasks, and provide clear milestones
Communication and building relationships are important in reducing burnout
Boundaries are vital to maintaining personal health and preventing burnout

Tags:

Show 0 Comments

Observability with Fluent Bit: Logs, Metrics & Traces

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Anurag Gupta, Eduardo Silva

2023-04-19

tldr - powered by Generative AI

Controlling data flow is crucial for cost reduction and efficient use of resources in logging and metrics management. Fluent Bit offers processors for modifying data and labels to optimize indexing and querying.

Companies generate 20-30% more logs each year, making control of data flow important for cost reduction and efficient resource use
Fluent Bit offers processors for modifying data and labels to optimize indexing and querying
Lua scripting can be used for log processing
Labels can be added, updated, or deleted using Fluent Bit processors
Fluent Bit can be used for metrics management and data scraping

Tags:

Show 0 Comments

It Is More Than Just Correlation - A Debug Journey

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Simon Pasquier, Vanessa Martini

2023-04-19

tldr - powered by Generative AI

The presentation discusses the challenges faced by site reliability engineers when troubleshooting issues in Kubernetes and introduces korrel8, an open source tool that aims to reduce the cognitive load of engineers when attempting to debug issues through the correlation of observability signals.

Observability signals are crucial for site reliability engineers to troubleshoot issues in Kubernetes
There is a lack of established open source tools that aggregate all the different observability signals and help users understand how their systems behave
Korrel8 is an open source project founded within Reddit that aims to make correlation across observability signals accessible to everyone
Korrel8 can reduce the cognitive load of engineers when attempting to debug issues
The presentation includes a demo of korrel8 and a sneak peek overview of the roadmap vision and next steps

Tags:

Show 0 Comments

OTel Me About Metrics: A Metrics 101 Crash Course

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Reese Lee

2023-04-19

tldr - powered by Generative AI

The presentation covers the basics of metrics and Open Telemetry, including the architecture of a metrics pipeline, metric instruments, and their use cases.

Metrics and Open Telemetry are used for observability and provide an API and SDK for instrumenting code and collecting telemetry data.
The media provider is the API entry point for metrics, and meters and instruments are used to record measurements.
Aggregation, temporality, and dimensions are important concepts in metrics.
Async up down counters and gauges are two types of metric instruments that are used for different purposes.
There is much more to learn about metrics and Open Telemetry, including customization options and different processors for transforming metrics data.
The presentation provides references for further exploration and credits to the people who contributed to the content.

Tags:

Show 0 Comments

DoorDash’s Journey From StatsD To Prometheus With 10 Million Metrics/Second

Conference: KubeCon + CloudNativeCon North America 2022

Authors: Benjamin Raskin, Emma Wang

2022-10-28

tldr - powered by Generative AI

The presentation discusses the migration of infrastructure and application metrics from Stacy to Prometheus at DoorDash, and the challenges and learnings encountered during the process.

The migration involved over 130 services, 1500 dashboards, and more than 7000 alerts.
The use of histograms instead of percentiles was a difficult change for engineers to adapt to.
The instance label is a high cardinality label that needs to be pre-aggregated to reduce volume.
PromptCare's aggregation gateway was used for some metrics, but push models were limited to special cases.
Automating the monitoring onboarding process for teams is crucial.
The migration was completed in one year, resulting in over 27,000 alerts and 2200 dashboards.
Post-migration, DoorDash ingests over 15 million metrics per second and persists over 10 million metrics per second.

Tags:

Show 0 Comments