logo

Building Apache Druid on Kubernetes: How Dailymotion Serves Partner Data

2023-04-21

Authors:   Cyril Corbon, Alex Triquet


Summary

The presentation discusses the use of Kubernetes for running stateful sets and taking advantage of its features for data management, database management, application monitoring, application deployment monitoring, logging, machine learning, and university management.
  • Kubernetes is used for running stateful sets and taking advantage of its features for data management, database management, application monitoring, application deployment monitoring, logging, machine learning, and university management
  • The presentation discusses the use of Apache Druid for data ingestion and reconciliation
  • The presentation highlights the importance of RAM for Druid clusters and the benefits of caching segments
  • The presentation discusses the plan to migrate to version 325 and Java 17, decrease costs by migrating to ARM, and use the TCD and Kubernetes API for endpoint and information retrieval
  • The presentation acknowledges the challenges of running stateful sets on Kubernetes but believes it is the best option
  • The presentation expresses gratitude to the Druid and Druid operational community for their support
The speaker mentions that they had issues with the Java version and RAM caching when running on spot instances. They also plan to add a process credit in front of their SQL Android and run the 200 without equippers using the TCD and Kubernetes API to get all the endpoint and information that they need. The speaker acknowledges the challenges of running stateful sets on Kubernetes but believes it is the best option.

Abstract

At Dailymotion we use Apache Druid to serve our partner-data (views, monetization etc.) Druid is a distributed time-series / column / database with few competitors in the OLAP world. We install our druid cluster on kubernetes with druid-operator. This allows us to easily manage its lifecycle, do automatic rolling upgrade, provide an easy way to scale and use cncf ecosystem for automatic exposition with cert-manager, external-dns and ingress-nginx. However to run our druid setup on kubernetes we had to solve challenges related to availability, metadata and evolutivity. We will present our architectures (past and present) and the technical means we used to install and manage druid on our clusters with an opensource operator, how we optimize our caching with Memcached and caffeine and how we performed a migration with no downtime and no costly backfills. At the end we will explain how we integrate this setup with our gitops workflow , how we realize common operations (vertical/horizontal scaling, updates, monitoring) and what we are working on as the next step for our Druid stack: Ingestion improvements, multi-tenancy and running Druid without Apache Zookeeper

Materials:

Post a comment

Related work




Authors: Ricardo Katz, James Strong
2022-10-28

Authors: Christian Weichel, Manuel de Brito Fontes
2022-10-28

Conference:  Defcon 26
Authors:
2018-08-01