The presentation discusses the importance of tail latency and overhead metrics in performance evaluation, as well as the need for system configuration and multiple experiments to increase confidence in results. The speaker also recommends various tools and resources for performance validation.
- Tail latency is important as scale grows
- Consider overhead metrics such as CPU and memory utilization
- Performance interpretation metrics can help identify botnets
- System configuration should isolate systems to avoid unwanted interference
- Multiple experiments increase confidence in results
- Netperf, cubenet benz, and BPF tools are useful for benchmarking
- Resources for performance validation include books by Brendan Gregg and the Kernel Pages
During a performance evaluation, the team discovered that native routing had reduced performance compared to tunneling, which was unexpected. They replicated the setup without Cilium and found that the performance problem was still present, leading them to discover that the issue was with virtual electronic device handling rather than Cilium itself. The team contributed modifications to the Linux kernel to address the issue.