feat: Arrow-native geospatial serialization (geospatialAsArrow)#328
Open
jatorre wants to merge 1 commit intodatabricks:mainfrom
Open
feat: Arrow-native geospatial serialization (geospatialAsArrow)#328jatorre wants to merge 1 commit intodatabricks:mainfrom
jatorre wants to merge 1 commit intodatabricks:mainfrom
Conversation
Add geospatialAsArrow field (field 6) to TSparkArrowTypes Thrift struct and expose it as WithArrowNativeGeospatial() ConnOption. When enabled, geometry/geography columns are returned as Struct<srid: Int32, wkb: Binary> instead of EWKT strings, enabling efficient binary geometry transfer over Arrow without string parsing. This is opt-in (defaults to false) and requires Databricks Runtime with SPARK-54232 support. Changes: - cli_service.go: Add GeospatialAsArrow field 6 to TSparkArrowTypes - config.go: Add UseArrowNativeGeospatial to ArrowConfig - connection.go: Wire through to TExecuteStatementReq - connector.go: Add WithArrowNativeGeospatial() ConnOption
jatorre
added a commit
to jatorre/databricks
that referenced
this pull request
Mar 14, 2026
Expose geospatialAsArrow support (SPARK-54232) as an opt-in ADBC connection option. When set to "true", geometry/geography columns arrive as Struct<srid: Int32, wkb: Binary> instead of EWKT strings. This depends on databricks/databricks-sql-go#328 which adds the WithArrowNativeGeospatial() ConnOption to the underlying Go SQL driver. Usage via adbc_connect (e.g. from DuckDB adbc_scanner): adbc_connect({ 'driver': 'libadbc_driver_databricks.dylib', 'databricks.server_hostname': '...', 'databricks.arrow.native_geospatial': 'true' })
Author
|
As suggested by @avax94 |
Open
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for Arrow-native geospatial serialization via the
geospatialAsArrowfield (field 6) onTSparkArrowTypesinTExecuteStatementReq.When enabled, geometry/geography columns are returned as
Struct<srid: Int32, wkb: Binary>instead of EWKT strings, enabling efficient binary geometry transfer over Arrow without string parsing overhead.This follows the pattern established by the existing arrow type options (
timestampAsArrow,decimalAsArrow, etc.) and is based on the Databricks implementation of SPARK-54232.Changes
internal/cli_service/cli_service.go— AddGeospatialAsArrowas field 6 onTSparkArrowTypes(struct field, getter, IsSet, Read/Write methods)internal/config/config.go— AddUseArrowNativeGeospatial booltoArrowConfigwith DeepCopy supportconnection.go— WireGeospatialAsArrowintoTExecuteStatementReq.UseArrowNativeTypesconnector.go— AddWithArrowNativeGeospatial(enable bool)ConnOptionUsage
Behavior
false(no change to existing behavior)Struct<srid: Int32, wkb: Binary>with WKB binary payloadTesting
Verified against a Databricks SQL Warehouse:
VARCHAR(POINT(1 2))Struct<srid: Int32, wkb: Binary>with correct WKB bytesContext
This enables downstream consumers (ADBC drivers, DuckDB adbc_scanner, etc.) to receive geometry data in binary WKB format directly over Arrow, avoiding the overhead of parsing EWKT strings. The Databricks server already supports this via
TSparkArrowTypesfield 6 — this PR simply exposes the option to Go clients.