Skip to content

[GLUTEN-11862][VL] Work around GMT session timezone validation failure on macOS#11869

Merged
zhouyuan merged 8 commits intoapache:mainfrom
malinjawi:fix/vl-session-timezone-gmt
Apr 17, 2026
Merged

[GLUTEN-11862][VL] Work around GMT session timezone validation failure on macOS#11869
zhouyuan merged 8 commits intoapache:mainfrom
malinjawi:fix/vl-session-timezone-gmt

Conversation

@malinjawi
Copy link
Copy Markdown
Contributor

@malinjawi malinjawi commented Apr 2, 2026

What changes are proposed in this pull request?

Fixes #11862.

This PR works around a macOS-specific native validation failure in the Velox backend when the effective Spark session timezone is GMT.

Based on the discussion in #11862, the underlying behavior is likely related to upstream Folly/Velox timezone handling on macOS rather than a general Spark semantics problem. This patch keeps the workaround on the Gluten side so Spark session timezone values do not block native validation and execution in the Velox path.

The changes in this PR are:

  • normalize GMT and GMT+/-offset session timezone values to UTC-equivalent forms before passing them into native Velox query config
  • use the runtime session timezone in SubstraitToVeloxPlanValidator instead of hardcoding GMT
  • apply the same normalization when setting the Velox writer timezone config
  • add a regression test covering execution with spark.sql.session.timeZone=GMT

How was this patch tested?

  • reproduced the failure on a clean unpatched checkout on macOS after a fresh native rebuild
  • verified the failure is:
    • session 'session_timezone' set with invalid value 'GMT'
  • rebuilt native Gluten/Velox with this patch
  • ran the new targeted regression test in MiscOperatorSuite:
    • cast date to timestamp with GMT session timezone
  • the targeted regression test passes with this patch

Was this patch authored or co-authored using generative AI tooling?

Generated-by: IBM BOB

@github-actions github-actions bot added the VELOX label Apr 2, 2026
@malinjawi malinjawi changed the title [GLUTEN-14721][VL] Work around GMT session timezone validation failure on macOS [GLUTEN-11862][VL] Work around GMT session timezone validation failure on macOS Apr 2, 2026

void WholeStageResultIterator::addIteratorSplits(const std::vector<std::shared_ptr<ResultIterator>>& inputIterators) {
GLUTEN_CHECK(!allSplitsAdded_, "Method addIteratorSplits should not be called since all splits has been added to the Velox task.");
GLUTEN_CHECK(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unrelated changes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jinchengchenghh removed dist change

const auto sessionTimezone =
normalizeSessionTimezone(it == confMap.end() ? std::string_view("UTC") : std::string_view(it->second));
std::unordered_map<std::string, std::string> configs{
{velox::core::QueryConfig::kSparkPartitionId, "0"}, {velox::core::QueryConfig::kSessionTimezone, "GMT"}};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the GMT to other value is enough

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed. I updated kSessionTimezone in to UTC.

@zhouyuan
Copy link
Copy Markdown
Member

related: facebookincubator/velox#17010

@zhouyuan zhouyuan merged commit a7058c7 into apache:main Apr 17, 2026
59 checks passed
return sparkConfs;
}

std::string normalizeSessionTimezone(std::string_view sessionTimezone) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more appropriate to move the normalization step to Scala before sending data to native, so that the timezone received by C++ is always correct and we don’t need to apply normalization in every caller of the session timezone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] Native validation fails when Spark session timezone is GMT

4 participants