Skip to content

HiveQL, CTEs and FROM first SELECT statements #2236

@Viicos

Description

@Viicos

#235 introduced support for HiveQL, and modified how CTEs are parsed to parse an additional FROM keyword.

A user reported some questions on the PR, and looking at the documentation links provided, it seems like HiveQL has the ability to use FROM directly after a CTE, but it is unclear what for.
This link shows an example to insert from a CTE, and this one shows a SELECT statement, using the FROM first variant (but it also seems like the Hive dialect doesn't has supports_from_first_select()?).

The issue is that when using the generic dialect (or dialects supporting from first), the parsing of the FROM keyword breaks, e.g.:

WITH test AS (FROM t SELECT a) FROM test SELECT a

The AST looks like (reduced for visibility):

Query(
    Query {
        with: Some(
            With {
                with_token: TokenWithSpan {
                    token: Word(
                        Word {
                            value: "WITH",
                            quote_style: None,
                            keyword: WITH,
                        },
                    ),
                    span: Span(Location(1,1)..Location(1,5)),
                },
                recursive: false,
                cte_tables: [
                    Cte {
                        alias: TableAlias {
                            name: Ident {
                                value: "test",
                                quote_style: None,
                                span: Span(Location(1,6)..Location(1,10)),
                            },
                            columns: [],
                        },
                        query: Query {
                            with: None,
                            body: Select(
                                Select {
                                    select_token: Some(
                                        TokenWithSpan {
                                            token: Word(
                                                Word {
                                                    value: "SELECT",
                                                    quote_style: None,
                                                    keyword: SELECT,
                                                },
                                            ),
                                            span: Span(Location(1,22)..Location(1,28)),
                                        },
                                    ),
                                    projection: [
                                        UnnamedExpr(
                                            Identifier(
                                                Ident {
                                                    value: "a",
                                                    quote_style: None,
                                                    span: Span(Location(1,29)..Location(1,30)),
                                                },
                                            ),
                                        ),
                                    ],
                                    from: [
                                        TableWithJoins {
                                            relation: Table {
                                                name: ObjectName(
                                                    [
                                                        Identifier(
                                                            Ident {
                                                                value: "t",
                                                                quote_style: None,
                                                                span: Span(Location(1,20)..Location(1,21)),
                                                            },
                                                        ),
                                                    ],
                                                ),
                                            },
                                        },
                                    ],
                                    flavor: FromFirst,
                                },
                            ),
                        },
                        from: Some(  // CTE parsed the FROM
                            Ident {
                                value: "test",
                                quote_style: None,
                                span: Span(Location(1,37)..Location(1,41)),
                            },
                        ),
                        closing_paren_token: TokenWithSpan {
                            token: RParen,
                            span: Span(Location(1,30)..Location(1,31)),
                        },
                    },
                ],
            },
        ),
        body: Select(
            Select {
                select_token: Some(
                    TokenWithSpan {
                        token: Word(
                            Word {
                                value: "SELECT",
                                quote_style: None,
                                keyword: SELECT,
                            },
                        ),
                        span: Span(Location(1,42)..Location(1,48)),
                    },
                ),
                from_token: None,  // The actual SELECT query doesn't have the FROM
                projection: [
                    UnnamedExpr(
                        Identifier(
                            Ident {
                                value: "a",
                                quote_style: None,
                                span: Span(Location(1,49)..Location(1,50)),
                            },
                        ),
                    ),
                ],
                from: [],  // and no FROM available
                flavor: Standard,
            },
        ),
    },
)

I think the simplest fix (although not ideal according to #1430) would be to gate the parsing of the FROM keyword in CTEs only if the current dialect is Hive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions