Better Scalability and More Isolation? The Cortex “Shuffle Sharding” Story

Conference: KubeCon + CloudNativeCon Europe 2021

Authors: Tom Wilkie

Summary

The presentation discusses the concept of shuffle sharding as a solution to prevent single node outages and improve isolation between tenants in a horizontally scalable cortex system.

Cortex system aims to be horizontally scalable by hashing labels within samples and spreading data among nodes in a cluster
Replication factor of three and quarant reads and writes are used to prevent single node outages
Shuffle sharding builds small virtual clusters inside a larger real cluster to improve isolation between tenants
Shuffle sharding is a more flexible and manageable solution compared to cellular approach of mapping tenants to clusters
Amazon sponsored Grafana Labs to make changes to Cortex and worked closely with them on the design and review

The speaker mentions how the chance of a total outage on the cluster is getting higher as Cortex clusters get bigger and bigger, and how a poison request or a bad query could take out an entire cluster for all tenants. Shuffle sharding is presented as a solution to this problem, as it builds small virtual clusters inside a much larger real cluster, improving isolation between tenants at not a huge expense in terms of utilization.

Abstract

Cortex is a horizontally-scalable, highly-available and multi-tenant Prometheus-compatible time series database. For many years it has been possible to scale Cortex clusters to hundreds of replicas. The relatively simple Dynamo-style replication relies on quorum consistency for reads and writes. As such, a dual-replica failure can lead to an outage for all tenants. To address this we implemented a technique called “Shuffle Sharding” in Cortex. Shuffle Sharding lets you automatically pick a random “replica set” for each tenant, allowing you to isolate tenants and reduce the chance of an outage. In this talk we’ll show you how shuffle sharding achieves better scalability and more isolation, both in theory and in practice. We’ll walk you through the design on both the read and write path of Cortex. Finally we’ll do a live demo of shuffle sharding and how you can “take out” multiple replicas without affecting all tenants.

Materials:

Tags:

Related work

Cortex: How to Run a Rock Solid Multi-Tenant Prometheus

Conference: KubeCon + CloudNativeCon Europe 2023

Authors: Friedrich Gonzalez, Alan Protasio

2023-04-21