logo

⚡ Lightning Talk: Back To Basics: How To Measure Etcd Performance And Not To Die Trying

2022-10-25

Authors:   David Perez Rodriguez


Abstract

Everybody either knows what Kubernetes is or has heard it. It’s a critical component to scalable, high availability and distributed design of most cloud based productions systems. Why would I bother understanding how it behaves outside the cloud provider I commonly use? Well, that was the case of this project, which aimed to build an IoT system that handles Terabytes of data, entirely on-prem due to business needs. As expected, things were not behaving the same as in the cloud provider: lots of kube-api errors, missed heartbeats, database operators started rolling restarting deployments because of it; but the main reason was well hidden from the sight: etcd performance was not great on prem. etcd has an extremely and sustained high performance that is based on two factors: latency and throughput. But in this on-prem environment, latency was affected by the hardware’s initial design. How do you measure etcd performance? Benchmarks to the rescue! Learn about this experience, what is benchmark, what is latency, what is throughput and how to effectively measure etcd performance through benchmarks to correctly test your infrastructure when a brand new kubernetes cluster is created, particularly on-prem, and take advantage of the full potential of the Kubernetes environment.

Materials: