The presentation discusses the migration of infrastructure and application metrics from Stacy to Prometheus at DoorDash, and the challenges and learnings encountered during the process.
- The migration involved over 130 services, 1500 dashboards, and more than 7000 alerts.
- The use of histograms instead of percentiles was a difficult change for engineers to adapt to.
- The instance label is a high cardinality label that needs to be pre-aggregated to reduce volume.
- PromptCare's aggregation gateway was used for some metrics, but push models were limited to special cases.
- Automating the monitoring onboarding process for teams is crucial.
- The migration was completed in one year, resulting in over 27,000 alerts and 2200 dashboards.
- Post-migration, DoorDash ingests over 15 million metrics per second and persists over 10 million metrics per second.