Building on-Premises MLOps for ISS Columbus Ground Operations

Conference: KubeCon + CloudNativeCon Europe 2023

2023-04-20

Authors: Samo Turk, Christian Geier

Summary

Building an on-premise MLops platform for International Space Station data with a small team is challenging but possible with automation and open-source components.

Automate as much as possible, especially for small teams
Ask for help and give back to the community
Experiment but also learn the basics
Building an on-premise MLops platform for International Space Station data is challenging but possible with commitment
Key pillars of the platform are Kubernetes, GitLab, and Kubeflow
MicroK8s was chosen as the Kubernetes distribution for its ease of deployment and add-ons
Kubeflow is a complete data science workbench that is Kubernetes native and actively developed
Automation is necessary for managing the platform
Telemetry data from the Columbus module of the International Space Station is used for anomaly detection, diagnostics, and configuration
Microgravity introduces unique challenges for monitoring and ventilation

The team had to develop an on-premise MLops platform for International Space Station data because the sensitive data could not be stored on public cloud. They chose Kubernetes, GitLab, and Kubeflow as the key pillars of the platform and used MicroK8s as the Kubernetes distribution for its ease of deployment and add-ons. They also realized the importance of automation for managing the platform and the unique challenges of monitoring and ventilation in microgravity.

Abstract

In collaboration with Airbus and two German universities the speakers are supporting the operations of the International Space Station’s (ISS) Columbus module with development of anomaly detection, root cause analysis and reconfiguration suggestion algorithms. As the ISS’s sensor data streams are not allowed in public clouds for regulatory reasons, they had to implement a bespoke integrated MLOps platform deployed on-premises to develop and run the custom build algorithms. In this talk it will be discussed how a bunch of AI engineers and data scientists with hardly any knowledge on K8s became full-fledged cloud engineers and built a system around GitLab, microk8s and Kubeflow which can be deployed completely automatically. The speakers will answer questions on how to bootstrap and automate the deployment of such a platform, how to support multiple Linux distributions as your hosts, how to manage and provision storage, how to expose services running on K8s, how to provide IAM, and how to not go mad while handling TLS issues.

Materials:

Slides

Tags:

Airbus

International Space Station

Columbus module

anomaly detection

root cause analysis