Skip to content

feat: [ANSI] Ansi sql error messages#3580

Open
parthchandra wants to merge 4 commits intoapache:mainfrom
parthchandra:sql-query-errors
Open

feat: [ANSI] Ansi sql error messages#3580
parthchandra wants to merge 4 commits intoapache:mainfrom
parthchandra:sql-query-errors

Conversation

@parthchandra
Copy link
Contributor

Which issue does this PR close?

Closes parts of #551
Closes #2215
Closes #3375

Rationale for this change

With Spark 4.0 (And Spark 3.5 with Ansi mode), Spark produces ansi compliant error messages that have an error code and in many cases include the original SQL query. When we encounter errors in native code, Comet throws a SparkException or CometNativeException that do not conform to the expected error reporting standard.

What changes are included in this PR?

This PR introduces a framework to report ansi compliant error messages from native code.

Summary of error propagation -

  1. Spark-side query context serialization : For every serialized expression and aggregate expression, a unique expr_id is generated. If the expression's origin carries a QueryContext (SQL text, line, column, object name), it is extracted and attached to the protobuf. This is done for both Expr and AggExpr.
  2. Native planner (planner.rs): The PhysicalPlanner now holds a QueryContextMap. When planning Expr and AggExpr nodes, if expr_id and query_context are present, the context is registered in the map. When creating physical expressions for Cast, CheckOverflow, ListExtract, SumDecimal, AvgDecimal, and arithmetic binary expressions, the relevant QueryContext is looked up and passed to the constructor.
  3. Native errors : The SparkError enum is extended with new variants will all the Spark ANSI errors (from https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala). A new SparkErrorWithContext type wraps a SparkError with a QueryContext. All affected expression implementations look up the context and produce a SparkErrorWithContext when available.
    The SparkError implementation also has new to_json() and exception_class() methods for JNI serialization.
  4. JNI boundary (errors.rs -> CometQueryExecutionException): The throw_exception function now checks for SparkErrorWithContext or SparkError and throws CometQueryExecutionException. CometQueryExecutionException carries the entire SparkErrorWithContext as a JSON message. On the Scala side, CometExecIterator catches this exception and calls SparkErrorConverter.convertToSparkException() to convert to the appropriate Spark exception. If the JSON message contained the QueryContext, the exception will contain the query, otherwise it will not.
  5. There are two version specific implementations -one for Spark 3.x (fallback to generic SparkException) and one for Spark 4.0 (calls the exact QueryExecutionErrors.* methods).

Notes: Not all expressions have been updated. All the expressions that failed unit tests as a result of incorrect error messages have been updated. ( Cast, CheckOverflow, ListExtract, SumDecimal, AvgDecimal, and binary arithmetic expressions). Binary arithmetic expressions are now represented as CheckedBinaryExpr which also includes the query context.
Most errors in QueryExecutionErrors are reproduced as is in the native side. However some errors like INTERVAL_ARITHMETIC_OVERFLOW have a version with a user suggestion and one without a user suggestion. In such cases there are two variants in the native side.

How are these changes tested?

New unit tests. Failing tests listed in #551, #2215, #3375

This PR was produced with the generous assistance of Claude Code

@parthchandra
Copy link
Contributor Author

@coderfender, fyi

@coderfender
Copy link
Contributor

Thank you @parthchandra . This is awesome


// Extract the problematic fragment
let fragment = if start_idx < self.sql_text.len() && stop_idx <= self.sql_text.len() {
&self.sql_text[start_idx..stop_idx]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The earlier docs say that start_idx is a character index, but it is being used as a byte index here, I think. Perhaps you could tests for non-ASCII cases, if that makes sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed to use char index. Added a unit test

@parthchandra parthchandra marked this pull request as draft February 24, 2026 01:02
@parthchandra
Copy link
Contributor Author

Changed to draft to figure out backward compatibility

@parthchandra parthchandra marked this pull request as ready for review February 26, 2026 16:32
@parthchandra
Copy link
Contributor Author

@andygrove @coderfender This is ready for review. I'll rebase after the review.

* This does not include exceptions thrown during the eager execution of commands, which are
* grouped into [[QueryCompilationErrors]].
*/
private[sql] object QueryExecutionErrors extends QueryErrorsBase with ExecutionErrors {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Scala class is in the native module. Should this be here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea how this got there. This is a spark class! Removed.
Thanks for catching this.

@parthchandra
Copy link
Contributor Author

Also fixed another failing test.

*/
public boolean isJsonMessage() {
String msg = getMessage();
return msg != null && msg.trim().startsWith("{") && msg.trim().endsWith("}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: trim is called twice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

if overflowed {
// Set to None if overflow happens

if overflowed || !is_valid_decimal_precision(result, self.sum_precision) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a change in functionality? Why is the is_valid_decimal_precision check now needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky one. overflowing_add checks for integer overflow but decimal can overflow if the precision is exceeded so we need to check for that as well. This is caught by the DataFrameAggregateSuite.checkAggResultsForDecimalOverflow test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ANSI mode array access error messages don't match Spark format [ANSI] Include original SQL in error messages

3 participants