Update dependency io.openlineage:openlineage-java to v1.44.1#3002
Open
renovate[bot] wants to merge 1 commit intomainfrom
Open
Update dependency io.openlineage:openlineage-java to v1.44.1#3002renovate[bot] wants to merge 1 commit intomainfrom
renovate[bot] wants to merge 1 commit intomainfrom
Conversation
❌ Deploy Preview for peppy-sprite-186812 failed.
|
d415db2 to
a641137
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3002 +/- ##
=========================================
Coverage 81.18% 81.18%
Complexity 1506 1506
=========================================
Files 268 268
Lines 7356 7356
Branches 325 325
=========================================
Hits 5972 5972
Misses 1226 1226
Partials 158 158 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
403306b to
96a3b7b
Compare
dc448cf to
ebaf22c
Compare
ebaf22c to
a164571
Compare
a164571 to
f11be42
Compare
432f8d4 to
9d7550b
Compare
43064c3 to
7c19900
Compare
7c19900 to
828aa83
Compare
42f8ad2 to
ca5d001
Compare
ca5d001 to
dcce969
Compare
dcce969 to
d45585c
Compare
d45585c to
d31b116
Compare
d31b116 to
bec3771
Compare
bec3771 to
cfc84da
Compare
cfc84da to
f9f7c18
Compare
f9f7c18 to
3b1b2ea
Compare
3b1b2ea to
00ac754
Compare
00ac754 to
e8d4cc6
Compare
e8d4cc6 to
2d1359e
Compare
2d1359e to
cd81291
Compare
cd81291 to
3f63859
Compare
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
3f63859 to
96d5e69
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
1.23.0→1.44.1Release Notes
OpenLineage/OpenLineage (io.openlineage:openlineage-java)
v1.44.1Compare Source
Fixed
#4349@harelsAttach
ExtractionErrorRunFacetto run events when@handle_keyerror-decorated extraction methods fail, making previously invisible extraction errors visible to downstream consumers instead of silently emitting incomplete events._get_model_node#4348@harelsFix exception type mismatch in
_get_model_node()by raisingKeyErrorinstead ofRuntimeError, allowing the@handle_keyerrordecorator to catch it and returnNonegracefully when a node_id is not found in the manifest..get()for optional project version retrieval#4345@zagoodmanFix crash when
versionkey is absent fromdbt_project.yml, which became optional in dbt 1.5, by using.get()instead of direct key access.v1.44.0Compare Source
Added
#4313@jakub-moravecAdd JWT authenticator for Java and Python clients, enabling token-based authentication without requiring a custom authenticator implementation.
#4283@kchledowskiEnable extraction of input dataset symlinks from DataSourceRDD, providing richer lineage information for RDD-based Iceberg operations.
Changed
#4329@kchledowskiDisable column-level lineage extraction for LogicalRDD plans to prevent incorrect lineage caused by lost schema and transformation context.
#4331@kchledowskiDisable unreliable input schema extraction from LogicalRDD and instead extract schemas from Iceberg table metadata when reading via DataSourceRDD.
Fixed
#4285@LegendPawel-MarutAlign schema definitions for dbt-run-run-facet and dbt-version-run-facet to fix validation inconsistencies.
#4320@ah12068Handle missing profiles_dir key in run_results.json gracefully, falling back to default profile directory resolution.
#4298@gaurav-atlanFix the --target-path CLI argument not being parsed and passed to artifact processors, causing the default target path to always be used.
#4312@mobuchowskiFix false Flink 2.x detection when modern V2-based connectors are used with Flink 1.x by using JobStatusChangedListenerFactory for version detection.
#4282@Lukas-RiedelSend the Content-Encoding header when request body compression is enabled in the Java client, consistent with the Python client behavior.
#4315@mobuchowskiAdd fast environment detection check to skip Databricks-specific event filtering on non-Databricks platforms, reducing overhead.
#4311@mobuchowskiFix NullPointerException when processing Iceberg datasets with AWS Glue catalog by safely handling null Glue ARN values.
#4316@mobuchowskiFix ClassCastException on DROP TABLE commands in Databricks Runtime 14.2+ by handling ResolvedIdentifier alongside ResolvedTable.
buildruntime dependency#4344@mobuchowskiRemove the
buildpackage from runtime dependencies as it is only needed at build time and is already handled by the build system configuration.#4340@tstrilkaFix incorrect parent job name in ParentRunFacet for child events (SQL_JOB, RDD_JOB) on AWS Glue, where the raw spark.app.name was used instead of the resolved application name from platform-specific name resolvers.
Skip
testAppendWithRDDTransformationsandtestAppendWithRDDProcessingon Java 8 + Spark 3.5 where the Iceberg vendor module is not compiled due to Iceberg 1.7 requiring Java 11+.v1.43.0Compare Source
Added
#4258@mobuchowskiAdd support for extracting and reporting severity information from dbt tests in OpenLineage events.
#4257@mobuchowskiExtend the DataQualityAssertionsDatasetFacet with a severity field to indicate the importance level of data quality assertions.
#4263@kchledowskiEnable lineage extraction from DataSourceRDD when reading Iceberg data sources, supporting mixed RDD and DataFrame operations common in AWS Glue environments.
Fixed
#4262@mobuchowskiFix IndexError when running 'dbt-ol send-events' command without additional arguments by properly checking args length.
#4264@mobuchowskiFix incorrect "namespace" value in BigQuery column-level lineage to match the format used by regular BigQuery dataset collection.
#4268@mobuchowskiFix Delta detection failing when multiple Spark extensions are configured (comma-separated), enabling proper event filtering in environments like Azure Fabric or when using Gluten.
#4243@kchledowskiFix service provider interface conflicts caused by package relocation not updating META-INF/services configuration files.
v1.42.1: OpenLineage 1.42.1Compare Source
Added
#4218@RohithKayathiEnable posting lineage events to DataZone domains in different regions from where data transformation jobs run.
#4118@kchledowskiAdd new configuration option
spark.openlineage.filter.rddEventsDisabledto selectively disable OpenLineage event emission for RDD operations while keeping SQL-based operations enabled.#4124@kchledowskiAdd schema and column-level lineage support for Snowflake datasets when using the Spark-Snowflake connector.
#4215@wslulciucAdd support to override the application runID via the property
spark.openlineage.applicationRunId.#4182@jakub-moravecAdd a new facet to capture input parameters supplied to a job at the time of execution, enabling reproducibility, debugging, and richer lineage context.
Changed
#3768@tnazarewUpdate GCP Lineage transport to use new version of the producer library with fixed dependency shading.
#4220@mobuchowskiImprove error messages to indicate which transport failed to create.
#4207@mobuchowskiFix classloader conflicts with BigQuery connector by gating DEBUG toJSON() logging behind an additional flag and logging exceptions.
Fixed
#4197@dolfinusFix type annotation for
.with_additional_properties()method to correctly accept keyword arguments.#4192@kchledowskiFix BigQuery symlink namespace incorrectly having ".db" suffix in RUNNING and COMPLETE events by avoiding mutation of the Identifier object.
#4229@lawofcyclesAdd fallback mechanism to retrieve AWS region from EC2 Instance Metadata Service when environment variables are unavailable in YARN cluster mode.
#4222@kchledowskiFix missing inputs and column-level lineage when writing from AWS DynamicFrame by treating NewHadoopRDD as file-like.
#4228@RohithKayathiApply
spark.openlineage.dataset.removePath.patternto input field names in ColumnLineageFacet, and fix hashCode/equals methods to include additionalProperties.Removed
#4212@kacpermudaThe deprecated Airflow integration has been removed from the OpenLineage repository.
v1.42.0Compare Source
Added
#4218@RohithKayathiEnable posting lineage events to DataZone domains in different regions from where data transformation jobs run.
#4118@kchledowskiAdd new configuration option
spark.openlineage.filter.rddEventsDisabledto selectively disable OpenLineage event emission for RDD operations while keeping SQL-based operations enabled.#4124@kchledowskiAdd schema and column-level lineage support for Snowflake datasets when using the Spark-Snowflake connector.
#4215@wslulciucAdd support to override the application runID via the property
spark.openlineage.applicationRunId.#4182@jakub-moravecAdd a new facet to capture input parameters supplied to a job at the time of execution, enabling reproducibility, debugging, and richer lineage context.
Changed
#3768@tnazarewUpdate GCP Lineage transport to use new version of the producer library with fixed dependency shading.
#4220@mobuchowskiImprove error messages to indicate which transport failed to create.
#4207@mobuchowskiFix classloader conflicts with BigQuery connector by gating DEBUG toJSON() logging behind an additional flag and logging exceptions.
Fixed
#4197@dolfinusFix type annotation for
.with_additional_properties()method to correctly accept keyword arguments.#4192@kchledowskiFix BigQuery symlink namespace incorrectly having ".db" suffix in RUNNING and COMPLETE events by avoiding mutation of the Identifier object.
#4229@lawofcyclesAdd fallback mechanism to retrieve AWS region from EC2 Instance Metadata Service when environment variables are unavailable in YARN cluster mode.
#4222@kchledowskiFix missing inputs and column-level lineage when writing from AWS DynamicFrame by treating NewHadoopRDD as file-like.
#4228@RohithKayathiApply
spark.openlineage.dataset.removePath.patternto input field names in ColumnLineageFacet, and fix hashCode/equals methods to include additionalProperties.Removed
#4212@kacpermudaThe deprecated Airflow integration has been removed from the OpenLineage repository.
v1.41.0Compare Source
Added
#4189@kacpermudaAdd support for arbitrary extra information in JobDependency within JobDependenciesRunFacet.
#4185@kacpermudaAdd debug mode support to file transport for better troubleshooting.
#4160@harelsAdd support for capturing dbt model owner information from meta.owner in OpenLineage events.
#4151@mobuchowskiAdd DbtNodeJobFacet to provide additional dbt node information in job facets.
#4161@tnazarewAdd default name support to Hive catalog facet in Spark integration.
#4134@pawel-big-lebowskiAdd support for fetching input statistics for single input RDD jobs.
Changed
#4153@kchledowskiMigrate SQL parser from fork to upstream version 0.59 for better maintenance and compatibility.
#4178@mobuchowskiReduce aggressiveness of UUID normalization in Spark integration.
Fixed
#4186@kacpermudaImprove logging output in Python client.
#4165@dolfinusFix relation size calculation to ensure values are within reasonable bounds.
#4154@mobuchowskiAdd missing job facet schema to specification.
v1.40.1Compare Source
Fixed
#4135@mobuchowskiFixes breaking change in version 1.40.0.
v1.40.0Compare Source
Added
#4109@jakub-moravecAdd a standardized batch API endpoint to OpenLineage specification for handling multiple events in a single request.
#4116@mobuchowskiAdd ordinal_position field to track the position of fields in schema (1-indexed).
#4112@kacpermudaIntroduce JobDependenciesRunFacet to track dependencies between jobs.
#4103@jakub-moravecAdd support for temporary datasets to enable job-to-job lineage tracking.
#4075@luke-hoffman1Add fallback configuration for BigQuery project ID in Metastore integration.
#4123@kacpermudaInclude examples in Python generated classes for better documentation.
#4077@dolfinusAdd support for parsing jTDS JDBC URL format in Java client.
#4066@tnazarewAdd ParentRunFacet to Hive integration for tracking parent-child run relationships.
#4097@tnazarewAdd support for tracking LOAD and IMPORT operations in Hive.
#4085@tnazarewAdd support for tracking EXPORT operations in Hive.
#4079@tnazarewAdd START event emission support to Hive integration.
Fixed
#4121@usamakunwarFix Spark dataset facet builders for input datasets.
#4114@kchledowskiFix job name trimming logic in Spark integration.
#4113@pawel-big-lebowskiFix putAll operation failing on immutable maps.
#4108@pawel-big-lebowskiFix multiple issues with RDD job handling in Spark.
#4102@kchledowskiFix JDBC dbtable parsing to support any FROM clauses.
#4083@pawel-big-lebowskiFix Spark connector configuration for Databricks environments.
#4099@mobuchowskiCatch NoClassDefFoundError when buggy implementations exist on classpath.
#4104@mobuchowskiFix Snowflake identifier parsing to handle quoted identifiers correctly.
#4105@mobuchowskiStrip quotes from Snowflake account names for proper handling.
#4092@fm100Fix facet property names from snake_case to camelCase for consistency.
#4111@kacpermudaFix Python client facet generator after moving to UV build system.
#4093@antonlin1Fix retry configuration default merge with user-defined config in HTTP transports.
#4084@mandalbalmukundUpgrade commons-lang3 version to fix CVE security vulnerability.
#4126@dolfinusEnsure START and STOP events share the same runId in Hive integration.
v1.39.0Compare Source
Added
#3996@pawel-big-lebowskiAdd configurable dataset name normalization with support for date patterns, key-value pairs, and S3 location detection to enable proper dataset subsetting.
#4057@kchledowskiAdd missing input symlink facets for Databricks Unity Catalog tables.
Changed
#4058@kchledowskiRefactor column-level lineage dependency collector tests for better organization and maintainability.
Fixed
#4069@fm100Fix typo in IcebergCommitReportOutputDatasetFacet property name.
#4061@pawel-big-lebowskiFix dataset name trimming for column-level lineage inputs.
#4062@kacpermudaRemove unnecessary numpy import from Python client.
Removed
#3844@kacpermudaRemove Dagster integration from the repository.
v1.38.0Compare Source
Added
#4008@pawel-big-lebowskiAdd subset dataset facets to OpenLineage specification for representing dataset relationships.
#3978@heron--Allow attaching dataset quality information outside of InputDatasetFacet.
#4018@tnazarewAdd support for Spark structured streaming microbatch source write operations.
#4016@ddebowczyk92Add catalog properties support to Spark integration for better catalog metadata tracking.
#4039@ddebowczyk92Enhance BigQuery integration with GCP project ID and location in catalog properties.
#3972@kchledowskiAdd support for tracking COALESCE transformations in Spark jobs.
#3982@ddebowczyk92Add catalog facet support for vanilla Hive table operations.
#4013@pawel-big-lebowskiOutput statistics now available in complete events for better observability.
#3977@pawel-big-lebowskiAdd output statistics tracking for Spark RDD-based jobs.
#4050@pawel-big-lebowskiImprove generated model classes with proper equals and hashcode implementations.
#4022@mobuchowskiAdd support for capturing dbt tags in OpenLineage events.
#4017@mobuchowskiAdd dbt Cloud account ID tracking to dbt run facets.
#3987@mobuchowskiEnhance DbtRunRunFacet with additional metadata for better observability.
#4006@ddebowczyk92Add native Google Cloud Platform Lineage transport for Python client.
#3983@JDarDagranAdd fsspec filesystem support to FileTransport for broader filesystem compatibility.
#3980@kacpermudaAutomatically add OpenLineage client version as default tag in events.
#3986@gabrysiaolszAdd GCP Cloud Composer environment metadata facets to Airflow integration.
Changed
#4055@mobuchowskiUse dbt model aliases when generating dataset names for more accurate lineage.
#4029@EugeneYushinSerialize OpenLineage events to JSON format for improved debug logging.
#4030@EugeneYushinProperly respect user-overridden application names in event emission.
#4003@kchledowskiRefactor column-level lineage expression dependency collector for better maintainability.
#3994@JDarDagranEnhance logging for Iceberg input statistics collection.
#3985@pawel-big-lebowskiOptimize S3 operations by limiting external getFileStatus calls for large object sets.
#3964@kchledowskiRefactor TransformationInfo into shared Java client for cross-integration reuse.
#4026@dolfinusEnhance logging capabilities in asynchronous HTTP transport.
#4000@JDarDagranSupport Python type aliases in client code generation.
#3997@JDarDagranImprove code generation to properly handle nearly identical class definitions.
#4014@dolfinusFail fast with clear errors when custom token providers fail to load.
#4015@dolfinusImprove error visibility by not silencing import errors in transport factory.
#3968@kacpermudaUpdate import paths to use versioned facet and event modules.
#4012@JDarDagranImprove thread pool management in Java client utilities.
#3965@JDarDagranMigrate from pre-commit to prek for pre-commit hook management.
Fixed
#4053@jsjasonsebaFix incorrect Glue catalog detection due to always attempting ARN resolution.
#4052@kchledowskiFix column-level lineage failures on Spark runtimes without spark-hive package.
#4031@kchledowskiFix missing input datasets and column-level lineage for CreateDataSourceTableAsSelect and CreateHiveTableAsSelect commands.
#4044@EugeneYushinFix BigQuery intermediate job filtering by using bucket configuration.
#4002@MaciejGajewskiAdd additional exception handling for TypeNotPresentException in Spark 3.0.2.
#4034@JDarDagranCorrect license field specification in Python package metadata.
#4045@kacpermudaSupport both naming conventions for API key configuration parameter.
#4037@EugeneYushinFix build issue causing empty sources JAR files to be generated.
v1.37.0Compare Source
Added
#3950@mobuchowskiAdd Datadog transport with intelligent routing between sync/async transports based on configurable rules. Supports wildcard matching and provides seamless integration with Datadog's observability platform.
#3860@orthoxeroxAdd support for WriteDelta and WriteIcebergDelta logical plan nodes in Spark integration.
#3933@mobuchowskiAdd configuration option to override dbt job names in OpenLineage events.
#3923@kyungryunImprove JSON serialization performance with Jackson Blackbird module.
Changed
#3904@pawel-big-lebowskiDrop support for Spark 2.x versions. Minimum supported version is now Spark 3.x.
#3956@dolfinusOptimize HTTP transport performance by adjusting gzip compression level.
#3925@SalvadorRomoExtend streaming integration tests to support Spark 4.0.
Fixed
#3946@pawel-big-lebowskiLimit memory consumption, provide limits for the amount of dependencies processed (1M) and input fields returned in the facet (100K). Turns on dataset lineage by default.
#3949@ddebowczyk92Add limits to prevent performance issues with large schemas in column-level lineage processing.
#3934@pawel-big-lebowskiFix context factory implementation for Spark 4.0 compatibility.
#3930@yunchipangFix LogicalRelation constructor to maintain compatibility with Spark 4.0.
#3947@ddebowczyk92Fix parsing of vendor configurations in Spark OpenLineage configuration.
#3953@jroachgolf84Fix namespace handling in dbt external query facets.
#3943@JDarDagranFix configuration handling for user-supplied tags in Python client.
v1.36.0Compare Source
Added
#3877@pawel-big-lebowskiFix failing tests for Spark 4.0. Make delta integration tests pass with Delta 4.0 on Spark 4.
#3914@pawel-big-lebowskiExtend DebugFacet with additional information on Spark's driver memory configuration and current memory usage.
#3921@pawel-big-lebowskiAdd support for AlterTableCommand dataset building in Spark 4.0.
#3890@jroachgolf84Add query ID tracking to dbt integration.
#3918@mobuchowskiCapture query IDs from dbt structured logs for better traceability.
#3816@ddebowczyk92Formalize dataset naming conventions in Python client implementation.
Changed
#3907@pawel-big-lebowskiBump tested Spark versions.
#3851@dolfinusEnsure proper cleanup of OpenLineageClient when Spark application ends.
#3895@dolfinusReplace f-string usage in logging calls with proper logging formatting.
#3899@ShadiUpdate protobuf dependency to maintain compatibility with newer library versions.
#3869@mobuchowskiAdd documentation explaining compatibility testing processes.
Fixed
#3902@pawel-big-lebowskiMerge
SqlExecutionRDDVisitorandLogicalRDDVisitorclasses to avoid memory leak.#3909@pawel-big-lebowskiRefactor Iceberg handler implementation for better maintainability.
#3908@pawel-big-lebowskiAdd retry logic for handling empty row exceptions.
#3911@pawel-big-lebowskiFix Spark version configuration in Databricks test environment.
#3915@fettaFix kafka-upsert connector to properly identify kafka topics.
#3905@kacpermudaImprove test performance by implementing fail-fast behavior and reduced timeouts.
#3916@mobuchowskiImprove telemetry collection and fix performance issues with file reading.
#3894@mobuchowskiFix dbt version compatibility issues.
#3889@pawel-big-lebowskiFix filename handling to work correctly on Windows systems.
#3897@dolfinusAdjust logging level for transport aliasing messages.
#3901@kacpermudaImprove code documentation and add additional test coverage.
v1.35.0Compare Source
Added
#3848@dolfinusAdd spark_applicationDetails facet to all OpenLineage events emitted by the Spark integration
#3850@ddebowczyk92Adds support for additional facets in Spark integration
#3880@pawel-big-lebowskiAdd
spark.openlineage.disabledentry to disable OpenLineage integration through Spark config parameters#3779@pawel-big-lebowskiAdd extra timeout options to emit incomplete OpenLineage events in case of timeout when building facets. See
buildDatasetsTimePercentageandfacetsBuildingTimePercentagein docs for more details#3812@mobuchowskiAdds high-performance asynchronous HTTP transport with event ordering guarantees, configurable concurrency, and comprehensive error handling. Features START-before-completion event ordering, bounded queues, and real-time statistics
#3764@dolfinusAdds DbtRun facet for tracking dbt run information
#3829@kacpermudaAdds configuration options for CompositeTransport to control behavior and ordering
#3789@dolfinusAdds jobType facet to Hive integration
#3863@dolfinusAdds dialect field to SqlJobFacet for Hive integration
#3819@mobuchowskiAdds dialect field to SqlJobFacet specification
#3826@ddebowczyk92Formalizes job naming conventions in the specification
#3775@ddebowczyk92Formalizes dataset naming conventions in the specification
Changed
Configuration
📅 Schedule: Branch creation - "every 3 months on the first day of the month" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.