HIVE-29516 Fix NPE in StatsUtils.updateStats when column statistics a… by shubhluck · Pull Request #6382 · apache/hive

shubhluck · 2026-03-19T21:50:15Z

…re unavailable

Added null checks before iterating over column statistics in:

StatsUtils.updateStats()
StatsUtils.getColStatisticsUpdatingTableAlias()
StatsRulesProcFactory (JOIN statistics) This prevents query compilation failures during semijoin optimization when column-level statistics are incomplete, commonly seen with large TPC-DS datasets (100GB+).

What changes were proposed in this pull request?

This PR adds null checks before iterating over column statistics in three locations to prevent NullPointerException:

StatsUtils.updateStats() - Added null check for stats.getColumnStats() before the for-each loop, defaulting to empty list when null
StatsUtils.getColStatisticsUpdatingTableAlias() - Added null check with early return of empty list when parent column stats are null
StatsRulesProcFactory (JOIN statistics computation) - Added null check before iterating over column stats during join statistics calculation
The root cause is that Statistics.getColumnStats() returns null (not an empty list) when no column statistics are available:

public List<ColStatistics> getColumnStats() {
    if (columnStats != null) {
        return Lists.newArrayList(columnStats.values());
    }
    return null;  // Returns null, causing NPE in for-each loops
}

Why are the changes needed?

Query compilation fails with NullPointerException during semijoin optimization when column statistics are unavailable:

java.lang.NullPointerException
    at org.apache.hadoop.hive.ql.stats.StatsUtils.updateStats(StatsUtils.java:2067)
    at org.apache.hadoop.hive.ql.parse.TezCompiler.removeSemijoinOptimizationByBenefit(TezCompiler.java:1982)
    at org.apache.hadoop.hive.ql.parse.TezCompiler.semijoinRemovalBasedTransformations(TezCompiler.java:539)
    at org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:238)
    ...

This issue is particularly prevalent with:

Large TPC-DS datasets (100GB+) where statistics collection may be incomplete
Tables where column-level statistics have not been computed
Complex queries where intermediate operators lack column statistics
The fix ensures graceful handling when column statistics are unavailable, allowing the optimizer to continue using row-based statistics instead of failing.

Does this PR introduce any user-facing change?

No. This is a bug fix that prevents query compilation failures. Previously failing queries will now compile and execute successfully. There is no change to query results or behavior for queries that were already working.

How was this patch tested?

Reproduced the issue with TPC-DS queries at 100GB scale where column statistics were incomplete
Verified that queries failing with NPE now compile and execute successfully after the fix
Verified that queries with complete column statistics continue to work correctly and produce the same results
Existing unit tests pass without modification

To reproduce the original issue:

Generate TPC-DS dataset at 100GB+ scale
Do not compute column statistics (or ensure they are incomplete)
Run queries involving semijoin optimizations
Observe NPE during compilation

…re unavailable Added null checks before iterating over column statistics in: - StatsUtils.updateStats() - StatsUtils.getColStatisticsUpdatingTableAlias() - StatsRulesProcFactory (JOIN statistics) This prevents query compilation failures during semijoin optimization when column-level statistics are incomplete, commonly seen with large TPC-DS datasets (100GB+).

sonarqubecloud · 2026-03-19T23:18:38Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

asf-ci-hive added the tests pending label Mar 19, 2026

asf-ci-hive added tests passed and removed tests pending labels Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-29516 Fix NPE in StatsUtils.updateStats when column statistics a…#6382

HIVE-29516 Fix NPE in StatsUtils.updateStats when column statistics a…#6382
shubhluck wants to merge 1 commit intoapache:masterfrom
shubhluck:HIVE-29516

shubhluck commented Mar 19, 2026

Uh oh!

sonarqubecloud bot commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shubhluck commented Mar 19, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

sonarqubecloud bot commented Mar 19, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants