Fugue is an open-source abstraction layer that allows users to port native Python code to Spark or Dask with minimal code changes, making data science code framework-agnostic and scale-agnostic.
- Data scientists often find themselves reimplementing the same code to transition from Pandas to Spark when data grows too large for Pandas to handle
- Fugue solves this problem by providing an abstraction layer that allows users to port native Python code to Spark or Dask with minimal code changes
- Fugue makes data science code framework-agnostic and scale-agnostic, allowing it to be ported to different execution environments
- Fugue was demonstrated by showing how to scale data compute from a single machine to a Spark cluster set-up on Kubernetes
The demo showed how Fugue can apply a business logic to a small Pandas data frame and then bring it into Spark with minimal code changes. The demo also highlighted the differences between Pandas and Spark and how Fugue can make data science code framework-agnostic and scale-agnostic.