The presentation discusses the challenges of implementing cloud native and high performance computing (HPC) and how recent work is bridging the gap between the two.
- Cloud native and Kubernetes have become popular in modern IT deployments, but challenges remain in areas where HPC can have a larger impact.
- HPC involves aggregating computing power to deliver higher performance for solving large problems in science, engineering, and business.
- HPC deployments require low latency, high throughput, and numeral awareness, which are not common in most deployments.
- Advanced scheduling is also important for HPC deployments with millions of jobs and users with different software needs.
- The speaker shares an anecdote about CERN's experience with transitioning to Kubernetes for their HPC needs.
- High throughput computing is a similar paradigm to HPC, but focuses on the efficient execution of a large number of loosely coupled tasks.
- The speaker highlights the similarities between high throughput computing and cloud native systems.
The speaker shares their experience with CERN's HPC system, which involves a large particle accelerator that generates a lot of data from collisions. They initially faced challenges when trying to modernize their systems with Kubernetes, but with the help of Six Scalability, they were able to build a few thousand node cluster and schedule 300 pods per second. The next generation of their deployment will be based on Kubernetes.