Skip to content

[SPARK-56874][SQL] Fast-path DateTimeUtils.daysToMicros for ZoneOffset.UTC#55893

Draft
LuciferYang wants to merge 10 commits into
apache:masterfrom
LuciferYang:SPARK-daystomicros-utc-fastpath
Draft

[SPARK-56874][SQL] Fast-path DateTimeUtils.daysToMicros for ZoneOffset.UTC#55893
LuciferYang wants to merge 10 commits into
apache:masterfrom
LuciferYang:SPARK-daystomicros-utc-fastpath

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang commented May 15, 2026

What changes were proposed in this pull request?

Add a ZoneOffset.UTC fast path to DateTimeUtils.daysToMicros(days, zoneId):

if (zoneId eq ZoneOffset.UTC) {
  Math.multiplyExact(days.toLong, MICROS_PER_DAY)
} else {
  // existing LocalDate -> ZonedDateTime -> Instant path
}

For UTC the answer is simply days * MICROS_PER_DAY, so the slow path's three heap allocations (LocalDate, ZonedDateTime, Instant) are wasted.

Why are the changes needed?

daysToMicros(days, ZoneOffset.UTC) is on the per-row hot path of the vectorized parquet reader (DateToTimestampNTZUpdater and the rebase variants), the row-based parquet converter (ParquetRowConverter), the Avro reader (AvroDeserializer), and DATE -> TIMESTAMP Cast (interpreted + codegen). All of them pass the ZoneOffset.UTC singleton, so the reference-equality fast path triggers everywhere it matters.

ParquetVectorUpdaterBenchmark results will be regenerated via GitHub Actions.

Does this PR introduce any user-facing change?

No, pure optimization. Behavior is preserved for every input in Spark's valid DateType range. Both paths use Math.multiplyExact internally and overflow at the same |days| ~= 107M boundary with the same ArithmeticException, far outside any reachable input.

How was this patch tested?

New DateTimeUtilsSuite contract test pins down the UTC fast path:

  • Asserts it agrees with a fixed-offset zone (Etc/GMT) path for a representative set of days (zero, positive, negative, +/-maxSafeDays).
  • Asserts it equals days * MICROS_PER_DAY directly, so divergence in some future JDK is caught.
  • Asserts ArithmeticException on overflow.

Was this patch authored or co-authored using generative AI tooling?

No

…t.UTC

`daysToMicros(days, zoneId)` allocates `LocalDate`, `ZonedDateTime`, and
`Instant` per call. For `ZoneOffset.UTC` the answer is simply
`days * MICROS_PER_DAY`, so add a reference-equality fast path that returns
`Math.multiplyExact(days.toLong, MICROS_PER_DAY)`.

Every real Spark caller passing UTC -- the vectorized parquet
`DateToTimestampNTZUpdater` and its rebase variants, the row-based
`ParquetRowConverter`, the Avro reader, and `Cast` (interpreted + codegen)
-- uses the `ZoneOffset.UTC` singleton, so the fast path triggers
everywhere it matters.
@LuciferYang LuciferYang changed the title [SPARK-56874][SQL] Fast-path DateTimeUtils.daysToMicros for ZoneOffset.UTC [SPARK-56874][SQL] Fast-path DateTimeUtils.daysToMicros for ZoneOffset.UTC May 15, 2026
@LuciferYang
Copy link
Copy Markdown
Contributor Author

test first, will update benchmark results later

def daysToMicros(days: Int, zoneId: ZoneId): Long = {
val instant = daysToLocalDate(days).atStartOfDay(zoneId).toInstant
instantToMicros(instant)
if (zoneId eq ZoneOffset.UTC) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, is there a version of this that takes an arbitrary offset (e.g. in seconds) vs. a nominal zone ID? The distinction hardly ever matters, except when various jurisdictions change their timezone rules. For example, here in British Columbia, we have officially stopped doing DST changes! 😅

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Could you update Java21/25 test result together, @LuciferYang ?

@dongjoon-hyun
Copy link
Copy Markdown
Member

cc @peter-toth

@LuciferYang
Copy link
Copy Markdown
Contributor Author

This PR hasn't updated the all benchmark results yet

@dongjoon-hyun
Copy link
Copy Markdown
Member

Ya, I noticed and commented it too~

@dongjoon-hyun dongjoon-hyun marked this pull request as draft May 16, 2026 17:47
@dongjoon-hyun
Copy link
Copy Markdown
Member

I just marked this as Draft. Please let us know when the PR is ready back~

@LuciferYang
Copy link
Copy Markdown
Contributor Author

I just marked this as Draft. Please let us know when the PR is ready back~

ok ~

LuciferYang and others added 7 commits May 17, 2026 02:04
…uet.ParquetVectorUpdaterBenchmark (JDK 21, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 25, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 17, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 17, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 21, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 25, Scala 2.13, split 1 of 1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants