Skip to content

feat: Arrow-native geospatial serialization (geospatialAsArrow)#328

Open
jatorre wants to merge 1 commit intodatabricks:mainfrom
jatorre:feat/arrow-native-geospatial
Open

feat: Arrow-native geospatial serialization (geospatialAsArrow)#328
jatorre wants to merge 1 commit intodatabricks:mainfrom
jatorre:feat/arrow-native-geospatial

Conversation

@jatorre
Copy link

@jatorre jatorre commented Mar 14, 2026

Summary

Adds support for Arrow-native geospatial serialization via the geospatialAsArrow field (field 6) on TSparkArrowTypes in TExecuteStatementReq.

When enabled, geometry/geography columns are returned as Struct<srid: Int32, wkb: Binary> instead of EWKT strings, enabling efficient binary geometry transfer over Arrow without string parsing overhead.

This follows the pattern established by the existing arrow type options (timestampAsArrow, decimalAsArrow, etc.) and is based on the Databricks implementation of SPARK-54232.

Changes

  • internal/cli_service/cli_service.go — Add GeospatialAsArrow as field 6 on TSparkArrowTypes (struct field, getter, IsSet, Read/Write methods)
  • internal/config/config.go — Add UseArrowNativeGeospatial bool to ArrowConfig with DeepCopy support
  • connection.go — Wire GeospatialAsArrow into TExecuteStatementReq.UseArrowNativeTypes
  • connector.go — Add WithArrowNativeGeospatial(enable bool) ConnOption

Usage

connector, err := dbsql.NewConnector(
    dbsql.WithServerHostname(host),
    dbsql.WithHTTPPath(httpPath),
    dbsql.WithAccessToken(token),
    dbsql.WithArrowNativeGeospatial(true), // opt-in
)

Behavior

  • Default: false (no change to existing behavior)
  • When enabled: Geometry columns arrive as Struct<srid: Int32, wkb: Binary> with WKB binary payload
  • When disabled: Geometry columns arrive as VARCHAR containing EWKT strings (current behavior)

Testing

Verified against a Databricks SQL Warehouse:

  • Without flag: geometry returned as VARCHAR (POINT(1 2))
  • With flag: geometry returned as Struct<srid: Int32, wkb: Binary> with correct WKB bytes

Context

This enables downstream consumers (ADBC drivers, DuckDB adbc_scanner, etc.) to receive geometry data in binary WKB format directly over Arrow, avoiding the overhead of parsing EWKT strings. The Databricks server already supports this via TSparkArrowTypes field 6 — this PR simply exposes the option to Go clients.

Add geospatialAsArrow field (field 6) to TSparkArrowTypes Thrift struct
and expose it as WithArrowNativeGeospatial() ConnOption.

When enabled, geometry/geography columns are returned as
Struct<srid: Int32, wkb: Binary> instead of EWKT strings, enabling
efficient binary geometry transfer over Arrow without string parsing.

This is opt-in (defaults to false) and requires Databricks Runtime
with SPARK-54232 support.

Changes:
- cli_service.go: Add GeospatialAsArrow field 6 to TSparkArrowTypes
- config.go: Add UseArrowNativeGeospatial to ArrowConfig
- connection.go: Wire through to TExecuteStatementReq
- connector.go: Add WithArrowNativeGeospatial() ConnOption
jatorre added a commit to jatorre/databricks that referenced this pull request Mar 14, 2026
Expose geospatialAsArrow support (SPARK-54232) as an opt-in ADBC
connection option. When set to "true", geometry/geography columns
arrive as Struct<srid: Int32, wkb: Binary> instead of EWKT strings.

This depends on databricks/databricks-sql-go#328 which adds the
WithArrowNativeGeospatial() ConnOption to the underlying Go SQL driver.

Usage via adbc_connect (e.g. from DuckDB adbc_scanner):

  adbc_connect({
    'driver': 'libadbc_driver_databricks.dylib',
    'databricks.server_hostname': '...',
    'databricks.arrow.native_geospatial': 'true'
  })
@jatorre
Copy link
Author

jatorre commented Mar 14, 2026

As suggested by @avax94

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant