Data.Rentgen is a Data Motion Lineage service, compatible with OpenLineage specification.
Note: service is under active development, and is not ready to use yet.
- Collect lineage events produced by OpenLineage clients & integrations (Spark, Airflow).
- Support consuming large amounts of lineage events, by using Kafka as event buffer and storing data in tables partitioned by event timestamp.
- Store operation-grained events (instead of job grained Marquez), for better detalization.
- Provide API for building run ↔ dataset lineage, as well as parent run → children run lineage.
- Ability to build lineage graph with specific time boundaries (unlike Marquez there lineage is build only for last job run).
- Ability to build lineage graph with different granularity. e.g. merge all individual Spark operations into Spark applicationId or Spark applicationName.
- This is not a Data Catalog. Use Datahub or OpenMetadata instead.
- Static Data Lineage like view → table is not supported.
- Currently column-level lineage is collected by OpenLineage, but not yet consumed by Data.Rentgen.