Fluid is an open-source project that provides an efficient and convenient data abstraction for data-intensive tasks in the cloud-native field, solving problems in the separation of storage and computing architecture.
- Data-intensive tasks face problems in the separation of storage and computing architecture, leading to reduced computing efficiency and huge overhead pressure on the underlying storage system.
- Fluid provides data affinity scheduling, distributed cache engine acceleration, and multi-source data integration data lake.
- Fluid's data scheduling accelerates a large number of big data and AI workloads in Alibaba Cloud and Tencent Cloud.
- Fluid's architecture includes two custom resources, a site and a runtime, and two major components, a controller manager and a scheduler.
- Fluid's site provides a unified interface for accessing data from IDC and the cloud and can accelerate data access through distributed cache.
- Fluid's scheduler intelligently schedules jobs to catch nodes and notifies the runtime to prefetch data to a specified node.
- Fluid's demo shows how to use Fluid to accelerate a machine learning training job and provides automatic expansion mechanisms for distributed cache flow.
In a real customer AI training case, the training data was relatively large and incomplete, placed on cloud object storage like S3. However, the validation data was sensitive and could not be placed on the cloud, needing to be placed on IDC storage like Save. Fluid's CRD provided a unified view and the ability to accelerate the distributed cache, dynamically moving data from Save to GPU instances on the cloud at the time of training and accelerating data access. When training was not required, data could be migrated to low-cost CPU nodes, avoiding the use of GPUs and network dedicated land.