How To Build a Distributed System (And Should You?)

Conference: KubeCon + CloudNativeCon North America 2022

2022-10-26

Authors: Rebecca Bilbro, Patrick Deziel

Summary

The presentation discusses the challenges and solutions in building and maintaining a distributed system for a global directory service that securely exchanges private information for auditing purposes.

Storing public certificates in a distributed system to avoid latency fees
Distributed systems allow for tolerance of failures and increased availability
Testing for concurrency bugs is crucial in building a distributed system
Reframing distributed systems as a flow of events across space and time can improve user experience
The use of Kafka in Tinder's app demonstrates the importance of strict ordering in distributed systems

The presentation describes the challenges of storing public certificates for a global directory service and the decision to use a distributed system to avoid latency fees. The speaker also emphasizes the importance of testing for concurrency bugs and the use of Kafka in Tinder's app to demonstrate the need for strict ordering in distributed systems.

Abstract

In this talk, we’ll tell the story of how we built our very own eventually consistent system which is currently deployed in production clusters across the US, Germany, and Singapore -- including all the mistakes we made along the way. We’ll walk through how we leveraged tools like gRPC, Kubernetes, LevelDB, and Prometheus to implement two new open source projects that serve as the heart of our system. We’ll also confess all the ways we messed up during the process — from struggling to debug protocol buffer errors, to tangling up send and receive goroutines, to reasoning about the phases of replication. It won’t all be pretty, but we hope you’ll benefit from the lessons we learned, including the most important lesson — that you *can* build your own distributed system. We’ll close out by talking about why rolling our own system (in spite of all the headaches and mistakes) made sense for our use case, and why it might also make sense for you. Attendees will walk away with a hearty introduction to distributed systems concepts, as well as a to-do list of things they can investigate in their own systems to determine how they might be able to reduce concurrency-related bugs and/or consistency-related costs, improve maintenance, and reach more daily active users around the world.

Materials:

Slides

Tags:

eventually consistent