Skip to content

Spec: Clarify non-default CRS conventions#15834

Open
milastdbx wants to merge 9 commits intoapache:mainfrom
milastdbx:updateGeoSpec
Open

Spec: Clarify non-default CRS conventions#15834
milastdbx wants to merge 9 commits intoapache:mainfrom
milastdbx:updateGeoSpec

Conversation

@milastdbx
Copy link
Copy Markdown

@milastdbx milastdbx commented Mar 31, 2026

Rationale for this change

Current geospatial spec has wording that different parties interpret differently. This PR aims to resolve the ambiguity.
his is inline with recent parquet format spec changes - apache/parquet-format#560

What changes are included in this PR?

In this PR spec is updated Geospatial types such that wording is more precise in what is allowed and what is suggested.

@github-actions github-actions Bot added the Specification Issues that may introduce spec changes. label Mar 31, 2026
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
@milastdbx milastdbx changed the title updated geo spec Update wording in Geospatial spec Apr 22, 2026
@milastdbx milastdbx marked this pull request as ready for review April 22, 2026 14:47
Copy link
Copy Markdown

@mkaravel mkaravel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@milastdbx Thank you for following up on this.

@mkaravel
Copy link
Copy Markdown

@rdblue @szehon-ho Could you please take a look?
This is the natural follow up of apache/parquet-format#560

Copy link
Copy Markdown
Member

@szehon-ho szehon-ho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, i have some minor suggestion to make it more readable. I realize that parquet one is committed already

Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
* `projjson`: [PROJJSON](https://proj.org/en/stable/specifications/projjson.html), `identifier` is the name of a table property where the projjson string is stored.
Non-default CRS values are specified by any string that uniquely identifies a coordinate reference system associated with this type.
To maximize interoperability, suggested (but not limited to) formats for CRS are:
* `<projjson>` - A complete CRS definition embedded directly using the [PROJJSON](https://proj.org/en/stable/specifications/projjson.html) specification. Example for `OGC:CRS83`: `{"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json","type": "GeographicCRS","name": "NAD83 (CRS83)","datum": {"type": "GeodeticReferenceFrame"...`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not put the two 'projjson' together

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean as the same entry? They are not same if this is what you have in mind.
The plain PROJJSON is expected to be more common than the other one, and the second one is expected to be less common than the authority:code and srid:identifier options. The order in my opinion is good.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean, one is on top, one is on bottom. Its much easier imho if they are next to each other, then one can read the link about projjson and then see the next one. now there's two places, two links.

(popularity may be a bit harder to measure for now, but you may know more than me).

Anyway just my 2c

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From parquet discussion, we realized that none of the implementations actually use that, and that its difficult to implement it in practice for parquet readers, hence we left it only for backward compatability purposes.

However, i'd like it to be "discouraged".

Although it is most likely true, that while it makes it hard for parquet writers to implement this, its probably not hard for iceberg writer.

with this in mind - wdyt ?

Comment thread format/spec.md Outdated
* `projjson`: [PROJJSON](https://proj.org/en/stable/specifications/projjson.html), `identifier` is the name of a table property where the projjson string is stored.
Non-default CRS values are specified by any string that uniquely identifies a coordinate reference system associated with this type.
To maximize interoperability, suggested formats for CRS include, but are not limited to:
* `<projjson>` - A complete CRS definition embedded directly using the [PROJJSON](https://proj.org/en/stable/specifications/projjson.html) specification. Example for `OGC:CRS83`: `{"$schema": "https://proj.org/schemas/v0.7/projjson.schema.json","type": "GeographicCRS","name": "NAD83 (CRS83)","datum": {"type": "GeodeticReferenceFrame"...`
Copy link
Copy Markdown
Contributor

@rdblue rdblue Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 to having this in Iceberg.

Embedding a raw JSON string in the type was allowed in Parquet because some implementations can't access key-value metadata properties first. But that isn't the case in Iceberg so custom JSON can and should be stored in a table property and referenced here.

Since this is not clear, I think that we should probably have a note specifically stating that PROJJSON is not allowed here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion for this clarification from our Slack discussion:

CRS identifiers must not contain PROJJSON definitions. PROJJSON may be stored in a table property and referenced. Implementations must not parse the CRS identifier as PROJJSON.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the spec to reflect this.

However something that bothers me:

  • CRS values are specified by any string that uniquely identifies a coordinate reference system
    Here we are being very explicit that "ANY STRING" is accepted
  • suggested formats for CRS include, but are not limited to
    One more place where we emphasize that these are suggestions and not limitations

Then at the end we do say that PROJJSON is forbidden (contradicts "ANY STRING" above).

While it can be read that this is suggestions + blocklist, which is fine, i still want to make sure we are on same page.

I guess the main question is - do we want for the above suggestions to actually be "limitations" according to spec ?

Wdyt about latest wording ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too bothered by ANY STRING. If we're being overly strict about wording, then it says the string should identify a CRS, not encode a CRS. If you feel like adding a statement to qualify this, you can: "any string that uniquely identifies a CRS (but is not a PROJJSON definition as stated below)".

My guideline for these cases is to go for clarity and I don't think it is unclear. If someone later argues "hey, it said ANY STRING" despite a clear prohibition below, I would have no problem being firm that it was written incorrectly.

@rdblue rdblue changed the title Update wording in Geospatial spec Spec: Clarify non-default CRS conventions Apr 24, 2026
Comment thread format/spec.md Outdated
Non-default CRS values are specified by any string that uniquely identifies a coordinate reference system associated with this type.
To maximize interoperability, suggested formats for CRS include, but are not limited to:
* `<authority>:<code>` - where `<authority>` represents some well known authorities, and `code` is the code used by the authority to identify the CRS. Examples are - `OGC:CRS84`, `OGC:CRS83`, `OGC:CRS27`, `EPSG:4326`, `EPSG:3857`, `IGNF:ATI`. See [https://spatialreference.org/](https://spatialreference.org/) for definitions of coordinate reference systems provided by some well known authorities.
* `srid:<identifier>` - A reference using a [Spatial reference identifier (SRID)](https://en.wikipedia.org/wiki/Spatial_reference_system#Identifier), where <identifier> is the numeric SRID value. For example: `SRID:0`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd roll this into the last bullet so that we don't have a special case for one we don't think will be used.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was not addressed.

Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated
Comment thread format/spec.md Outdated

* `srid`: [Spatial reference identifier](https://en.wikipedia.org/wiki/Spatial_reference_system#Identifier), `identifier` is the SRID itself.
* `projjson`: [PROJJSON](https://proj.org/en/stable/specifications/projjson.html), `identifier` is the name of a table property where the projjson string is stored.
CRS value must not contain inlined PROJJSON definitions and implementations must not parse the contents of the CRS as PROJJSON. If intention is for PROJJSON definition to be part of the table metadata, then it must be stored in a table property and referenced from the CRS field using the `projjson:<table_property_name>` form described above.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a good addition to explain that because PROJJSON is incredibly verbose (~4kb minimum), inlining it here may cause non-trivial performance degredation due to metadata size and make it difficult for implementations to produce reasonable error messages without spatial dependencies. (Or a different justification if this isn't the case!)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me, updated wording

Copy link
Copy Markdown
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I approved this before I realized that one of my previous comments was closed but not addressed:

I'd roll this [SRID format] into the last bullet [about authority:code] so that we don't have a special case for one we don't think will be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Specification Issues that may introduce spec changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants