High Throughput with Low Resource Usage: A Logging Journey

Conference: KubeCon + CloudNativeCon Europe 2021

Authors: Eduardo Silva

Summary

The presentation discusses the challenges of centralizing and analyzing data from various sources, particularly log messages, and the importance of log processing engines in managing and optimizing data flow.

Centralizing and analyzing data from various sources, particularly log messages, is a challenge in modern computing environments with multiple applications and microservices.
Log processing engines are essential in managing and optimizing data flow by collecting data from different sources, parsing and filtering it, and sending it to various destinations.
Log processing engines need to handle various tasks, including collecting data from different sources, parsing and filtering data, serializing data, buffering data, and delivering data to different destinations.
Log processing engines need to be able to handle different data formats and structures, including JSON and binary formats.
Log processing engines need to be able to handle different output formats and delivery methods, including network setup, payload formatting, and return codes.

The speaker mentions the common problem of developers enabling debug mode for an application, resulting in an increase in log messages and affecting performance. This illustrates the importance of filtering and reducing data in log processing engines to optimize data flow and performance.

Abstract

In Logging, there is a common fact: more applications means more data to handle. Running services at scale in a distributed environment brings exciting challenges for data management, but with the volume of data increasing there is a necessity to ship this data faster, but a few ones realize the side effect: high resource consumption. On implementing a logging pipeline, pre-processing of the data is mandatory, a simple example of this is Kubernetes metadata enrichment for every log record, but more data means more computing time, the same cost applies when delivering to the final storage or cloud service. In this session, we will do a deep dive into our journey of performance challenges that we faced in the Fluent Bit project around Network I/O + TLS, filesystem buffers, routing, and multiplexing for high throughput. We will share how did we go from 5k/sec to more than 30k/sec using a single-core CPU using purely design improvements and taking the most of Linux OS interfaces.

Materials:

Tags: