logo

Efficient Scheduling Of High Performance Batch Computing For Analytics Workloads With Volcano

2022-10-26

Authors:   Krzysztof Adamski, Tinco Boekestijn


Summary

ING Wholesale Banking Advanced Analytics team built a self-service platform using open source technology to empower employees to solve business needs. The platform has grown with over 400 projects and 2000 users. The team adapted cloud native tools and added a specialized Kubernetes scheduler (Volcano) to enable multi-tenant large scale processing capabilities.
  • ING built a self-service platform using open source technology to empower employees to solve business needs
  • The platform has grown with over 400 projects and 2000 users
  • The team adapted cloud native tools and added a specialized Kubernetes scheduler (Volcano) to enable multi-tenant large scale processing capabilities
ING's mission is to empower people to stay ahead in life and business. The platform's mission is to become a data-driven self-service platform that supports employees. The team emphasized scalability, seamlessness, and security in building the platform. They also wanted to emphasize engineering capabilities so that engineers can start their data analytics journey using predefined pipelines, building, sharing, testing, and deploying new insights for the business. The platform stores data securely and allows sharing of data resources based on predefined roles. The team faced challenges with cloud native tools, such as job management, scheduling, and multi-framework support. They are looking for the next big thing for their platform.

Abstract

Three years ago ING Wholesale Banking Advanced Analytics team set up an ambitious goal to gather in one place a curated portfolio of internal data sources together with a large scale compute platform. At its core the idea of allowing internal projects to get access to a rich toolset of open source and industry standards frameworks and preprocessed data to validate business ideas in the secure exploration environment. Extensive growth with over 300 internal projects so far and more than 2000 internal users proofs advanced analytics i.e. ML, AI, NLP capabilities should become easily consumable not only by specialized, dedicated teams, but make them close to subject matter experts. In this session we would like to shed more light on how a specialized cloud native Kubernetes scheduler (Volcano) enables us to deliver multi-tenant large scale processing capabilities. The optimal resource usage with stability of core services are key for our cloud native platform. To enable dynamic allocation and hdrf (hierarchical dominant resource fairness) we have created an extension to Apache Spark binaries. This allows users to use Volcano with Spark interactive mode in a Jupyter notebook. Additionally we have created interfaces to visualize all the scheduling metrics like the yarn ui.

Materials: