logo
Dates

Author


Conferences

Tags

Sort by:  

Authors: Benjamin Raskin, Emma Wang
2022-10-28

tldr - powered by Generative AI

The presentation discusses the migration of infrastructure and application metrics from Stacy to Prometheus at DoorDash, and the challenges and learnings encountered during the process.
  • The migration involved over 130 services, 1500 dashboards, and more than 7000 alerts.
  • The use of histograms instead of percentiles was a difficult change for engineers to adapt to.
  • The instance label is a high cardinality label that needs to be pre-aggregated to reduce volume.
  • PromptCare's aggregation gateway was used for some metrics, but push models were limited to special cases.
  • Automating the monitoring onboarding process for teams is crucial.
  • The migration was completed in one year, resulting in over 27,000 alerts and 2200 dashboards.
  • Post-migration, DoorDash ingests over 15 million metrics per second and persists over 10 million metrics per second.