Open lineage is a standard for capturing metadata around data processing workflows, which can help with debugging and backfilling. It allows for emitting lineage information through REST calls and has integrations with various tools such as Airflow and Spark.
- Open lineage captures metadata around data processing workflows, including information about data sets, schema, job inputs and outputs, and job versions.
- This metadata can be emitted through REST calls and stored in the Marquez model, which can be queried using various APIs.
- Open lineage can help with debugging by allowing for quick identification of data quality issues and tracking run states.
- It can also aid in backfilling by providing information about upstream and downstream dependencies and allowing for full or incremental processing.
- Open lineage has integrations with various tools such as Airflow and Spark, making it easy to incorporate into existing workflows.