[SPARK-56883][SQL] DESCRIBE FUNCTION for SQL UDFs#55915
Open
srielau wants to merge 1 commit into
Open
Conversation
c690ccf to
4de3ec1
Compare
Renders structured DESCRIBE output for SQL user-defined functions instead of the generic Class/Usage dump: Function/Type/Input/Returns, and in EXTENDED mode Comment/Collation/Deterministic/Data Access/Configs/Owner/ Create Time/Body/SQL Path. Ports the formatter from the Databricks runtime. - SQLFunction: add SCALAR/TABLE constants and fromExpressionInfo for reconstructing the function from its ExpressionInfo usage blob (covers both temp and persistent SQL UDFs). - DescribeFunctionCommand: dispatch to describeSQLFunction when the className matches SQLFunction.isSQLFunction; inline the SQL PATH display via SqlPathFormat (replaces DescribeFunctionCommandUtils). - SQLFunctionSuite: port describe tests for scalar/table SQL UDFs and derived routine characteristics.
4de3ec1 to
d8fb332
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Renders a structured
DESCRIBE FUNCTION [EXTENDED]output for SQL user-defined functions (temporary and persistent) in place of the genericFunction / Class / Usage:<json blob>dump thatDescribeFunctionCommandproduces today for any function whoseExpressionInfo.className != null.For SQL UDFs the output becomes:
Function:qualified nameType:SCALARorTABLEInput:parameter list (name + SQL type, column-aligned;DEFAULT <expr>and'comment'annotations are added in EXTENDED mode)Returns:scalar return type, or the table return columns (column comments and defaults are added in EXTENDED mode)Comment,Collation,Deterministic,Data Access(CONTAINS SQL/READS SQL DATA),Configs,Owner,Create Time,Body, andSQL Path.SQL Path:is emitted only when bothspark.sql.path.enabled = trueand a frozen path was persisted on the function atCREATE FUNCTIONtime (SPARK-56639 / SPARK-56520). The path is read from the function'sfunction.resolutionPathproperty and rendered throughSqlPathFormat.formatForDisplay, producing the same`catalog`.`namespace`format used elsewhere in DESCRIBE output. This shows the resolution path that the function will use during analysis — the creator's PATH frozen at CREATE time, not the invoker's currentPATH.Behavior for builtin functions and non-SQL UDFs is unchanged.
Class hierarchy / dispatch:
SQLFunction(catalyst): adds theSCALAR/TABLEconstants and a newfromExpressionInfo(info, parser)constructor that reconstructs aSQLFunctionfrom the JSON usage blob produced bytoExpressionInfo. This is the same path used by both temp UDFs (which are not in the catalog) and persistent UDFs.DescribeFunctionCommand(sql/core): whenSQLFunction.isSQLFunction(info.getClassName)is true, dispatches to a newdescribeSQLFunction(info, parser)helper that emits the column-aligned key/value rows shown above. The frozen SQL PATH is rendered inline throughSqlPathFormat; the temporaryDescribeFunctionCommandUtilshelper introduced for that purpose by SPARK-56639 is removed (its single responsibility is now absorbed bydescribeSQLFunction).SessionCatalog.registerFunction: when a persistent SQL UDF is invoked for the first time, the function registry caches it. Previously the cachedExpressionInfowas always built viamakeExprInfoForHiveFunction, which setsusage = null. That worked for the pre-existingDESCRIBE FUNCTIONcodepath (which doesn't readusage), but breaks the newdescribeSQLFunctionpath: after a SQL UDF has been invoked once,DESCRIBE FUNCTIONreads back the cached info andSQLFunction.fromExpressionInfocannot parsenull.registerFunctionnow branches onfuncDefinition.isUserDefinedFunctionand builds the structuredExpressionInfoviaUserDefinedFunction.fromCatalogFunction(funcDefinition, parser).toExpressionInfofor SQL UDFs (matching the lookup-side build inlookupPersistentFunction), so the cached info has the right usage blob for DESCRIBE.Why are the changes needed?
DESCRIBE FUNCTIONis intended to give users a human-readable description of a routine, analogous toDESCRIBE TABLEfor tables. For SQL UDFs the current output instead exposes the internal serialization format:That JSON blob is not part of any public surface, and the literal string
sqlFunction.forClass:is meaningless to users. All of the structured metadata we need — signature, return type, body, characteristics, frozen SQL PATH — is already serialized inExpressionInfo; this PR just formats it.Does this PR introduce any user-facing change?
Yes — the rows returned by
DESCRIBE FUNCTION [EXTENDED] <sql_udf>change.Before:
After (simple case):
After (function created under
spark.sql.path.enabled = truewith a non-default PATH at CREATE time):SQL Pathreflects the creator's frozen PATH, not the session's currentPATHat describe time. Output for builtin functions, Hive UDFs, and other non-SQL UDFs is unchanged.How was this patch tested?
Added four unit tests to
SQLFunctionSuite(sql/core):describe SQL scalar functions— temporary and persistent scalar UDFs with comments, defaults, andEXTENDEDmode. AssertsFunction,Type,Input(column-aligned, withDEFAULTand'comment'in extended mode),Returns,Deterministic,Data Access,Comment,Create Time,Body.describe SQL table functions— table UDFs with explicit return columns; assertsType: TABLE,Returnscolumns, and the EXTENDED-only fields.describe SQL functions with derived routine characteristics— checks thatDeterministicandData Accessreflect derived values for functions that read tables / call non-deterministic builtins, and that user-supplied characteristics are preserved.SPARK-56639: SQL function uses frozen SQL pathtest is extended: after switchingPATHto a different namespace it invokesdefault.frozen_fn(populating the function-registry cache) and then runsDESCRIBE FUNCTION EXTENDED default.frozen_fn, asserting theSQL Path:row shows the creator's frozen path (`spark_catalog`.`path_func_db_a`, `system`.`builtin`) and does not mention the invoker's current path namespace. This extension also exercises theSessionCatalog.registerFunctionfix above: prior to the fix, the DESCRIBE after the invocation hitCORRUPTED_CATALOG_FUNCTIONbecause the cachedExpressionInfohadusage = null.Each describe test uses
checkKeywordsExistagainstDESCRIBE FUNCTION [EXTENDED] <name>output.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (claude-opus-4-7)