Skip to content

[CALCITE-7448] Add support for : as a field/item access operator#4844

Open
tmater wants to merge 10 commits intoapache:mainfrom
tmater:colon-syntax-mode
Open

[CALCITE-7448] Add support for : as a field/item access operator#4844
tmater wants to merge 10 commits intoapache:mainfrom
tmater:colon-syntax-mode

Conversation

@tmater
Copy link
Copy Markdown

@tmater tmater commented Mar 24, 2026

Changes Proposed

This PR adds opt-in support for : as a field/item access operator behind a new SqlConformance#isColonFieldAccessAllowed() hook. No built-in conformance enables it yet, so the default parser behavior is unchanged.

The implementation treats a single : access as parser sugar for existing field/item access operators. In practice, v:field is lowered to the same DOT(v, field) shape as v.field, and parsing then continues through the usual dot/bracket postfixes (v:field.nested works, while v:field:leaf does not).

It also refactors postfix parsing so colon field access composes with existing dot and bracket access. That covers cases such as v:field, v:['field name'], arr[1]:field, obj['x']:nested['y'], and colon access followed by bracket or dot postfixes.

To avoid ambiguity when colon field access is enabled, JSON_OBJECT and JSON_OBJECTAGG must use VALUE syntax instead of : or comma-separated key/value shorthand.

Reproduction

select v:field, v:['field name'], arr[1]:field, obj['x']:nested from t;

select json_object(key v:field value arr[1]:field);

select (v::variant)[1] from t;

Testing

Added parser coverage in SqlParserTest and BabelParserTest for valid and invalid colon field access, JSON constructor disambiguation, and :: cast interactions.

Jira Link

CALCITE-7448

@caicancai
Copy link
Copy Markdown
Member

I might wait until the CI issue is resolved before reviewing this PR.

@caicancai
Copy link
Copy Markdown
Member

I don't understand Calcite JavaCC code very well, I might need to learn it. If no one reviews it this week, I'll start working on it next week.

@tmater
Copy link
Copy Markdown
Author

tmater commented Mar 25, 2026

I don't understand Calcite JavaCC code very well, I might need to learn it. If no one reviews it this week, I'll start working on it next week.

Thank you for taking a look, @caicancai. I agree this is still a fairly complex PR, and I wouldn’t claim to fully understand all of its implications yet.

That’s also why I tried to include a wide range of test variations and keep the changes gated behind a conformance config.

I’ve organized it into five commits (prior to the CI failures) to make the review easier. If it would help further, I’m happy to split it into 4–5 smaller PRs.

@mihaibudiu
Copy link
Copy Markdown
Contributor

Is the behavior with :: needing parenthesis a new feature, or was this already present previously?

…acket refactor

The branch moved bracket ([...]) and dot (.) postfix handling out of
Expression2()'s loop and into AddRegularPostfixes (called from
AddExpression2b). Babel's InfixCast production was implicitly relying
on Expression2()'s loop to consume postfixes after the DataType() call,
so after the refactor, expressions like v::varchar[1] would fail to
parse because no grammar rule picked up the trailing [1].

Fix: call AddRegularPostfixes(list) after DataType() in InfixCast,
making the postfix consumption explicit rather than relying on the
caller's loop.

Also consolidate InfixCast test coverage: remove the now-redundant
testInfixCastBracketAccessNeedsParentheses and extend
testColonFieldAccessWithInfixCast with comprehensive cases covering
bracket access, dot access, parenthesized forms, and array types—both
with and without isColonFieldAccessAllowed.
@tmater
Copy link
Copy Markdown
Author

tmater commented Apr 1, 2026

Is the behavior with :: needing parenthesis a new feature, or was this already present previously?

@mihaibudiu , good catch! The parenthesis requirement itself is not new, on main, v::varchar[1] already means "cast to VARCHAR[1]" (bracket binds to the type), so parentheses were always needed to subscript the cast result.

However, you're right that something was off: the bracket refactor broke parsing v::varchar[1] entirely, because InfixCast was implicitly relying on Expression2()'s loop to consume postfixes. This commit fixes that by adding an explicit AddRegularPostfixes(list) call after DataType() in InfixCast.

[
LOOKAHEAD(2, <COLON> SimpleIdentifier(),
{ this.conformance.isColonFieldAccessAllowed() })
<COLON>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is chained colon access not supported?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept : intentionally non-chainable here. After one : we are already back in the existing postfix world, so the follow-up access can be expressed with the normal operators:

  • a:b.c instead of a:b:c
  • a:['x'].y instead of a:['x']:y
  • (a:b)['x'] / a:b['x'] instead of a:b:['x']

So a second : would mostly be extra surface syntax, not extra expressiveness. Keeping it to a single colon also keeps the grammar narrower and avoids widening the ambiguous space around other colon uses, especially :: in Babel and JSON constructor : handling.

If we decide later that repeated : is a real dialect requirement, we can extend the current [...] to a loop, but I wanted to start with the minimal syntax that covers the targeted cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cover the Databricks and Snowflake capabilities?

s.pos()));
list.add(dt);
}
AddRegularPostfixes(list)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it will affect existing semantics.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the refactor, InfixCast could stop after DataType() and rely on the outer Expression2() loop to notice any following . or [...]. After the refactor, that outer postfix branch was removed and the shared postfix handling moved earlier into AddExpression2b(). But Babel InfixCast is not part of that earlier path; it is an extraBinaryExpressions hook that runs later.

So if InfixCast does not invoke AddRegularPostfixes() itself, nobody else will. That is why it has to “own” it now: not because the semantics changed, but because the control flow changed. The shared postfix parser still exists, but this is now the only place on the Babel :: path where it can be called.

Comment thread babel/src/test/java/org/apache/calcite/test/BabelParserTest.java
@caicancai
Copy link
Copy Markdown
Member

@tmater Thank you for following up. I left a question on Jira. Could you answer it when you have time?

@caicancai caicancai self-assigned this Apr 7, 2026
@mihaibudiu
Copy link
Copy Markdown
Contributor

I was on vacation, I plan to review this PR again.


// Multiple brackets bind to the type
sql("select v::varchar[1][2] from t")
.ok("SELECT `V` :: VARCHAR[1][2]\nFROM `T`");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But VARCHAR[1] is not a type. What does this mean?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is parser coverage rather than a meaningful type assertion. The test is checking that after ::, the parser still accepts postfix syntax on the RHS and groups it there, rather than attaching the postfix to the cast result. I agree the old example made that intent unclear, so I updated the test to make the associativity point clearer.

@sonarqubecloud
Copy link
Copy Markdown

// Bracket after :: binds to the type, not as subscript on the cast result
sql("select v::varchar array[1] from t")
.ok("SELECT `V` :: VARCHAR ARRAY[1]\nFROM `T`");
f.sql("select v::varchar array[1] from t")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand what the parse tree of this expression is.
I think the parser should actually reject this expression.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, this is quite ambiguous. The reason I added these cases was mainly to show that we did not introduce regressions while touching the INFIX_CAST grammar.

For v::varchar array[1], the tree is something like:

  INFIX_CAST(
    v,
    ITEM(SqlDataTypeSpec(VARCHAR ARRAY), 1)
  )

So the [1] is attaching on the type side, not as a subscript on the cast result.

Also, I backported these tests onto the base of this PR as proof: #4885. The same behavior is already present there, so this is not something introduced by the colon-field-access change.

Would you prefer fixing this ambiguity in this PR, or filing a Jira to clean it up separately? I am leaning towards a separate PR and polishing these tests a bit, maybe leaving only one ambiguous expression test just in case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if the behavior does not change, it's fine. If we think there's a bug in the parser, where it produces a nonsensical parse tree, we should probably file an issue which can be solved separately.

]
}

void AddBracketAccess(List<Object> list) :
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment describing what this is expected to parse; I know this code has been moved, but since the context of the caller is missing, the comment will help maintainers.

[
LOOKAHEAD(2, <COLON> SimpleIdentifier(),
{ this.conformance.isColonFieldAccessAllowed() })
<COLON>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cover the Databricks and Snowflake capabilities?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants