[SPARK-56874][SQL] Fast-path DateTimeUtils.daysToMicros for ZoneOffset.UTC#55893
[SPARK-56874][SQL] Fast-path DateTimeUtils.daysToMicros for ZoneOffset.UTC#55893LuciferYang wants to merge 10 commits into
DateTimeUtils.daysToMicros for ZoneOffset.UTC#55893Conversation
…t.UTC `daysToMicros(days, zoneId)` allocates `LocalDate`, `ZonedDateTime`, and `Instant` per call. For `ZoneOffset.UTC` the answer is simply `days * MICROS_PER_DAY`, so add a reference-equality fast path that returns `Math.multiplyExact(days.toLong, MICROS_PER_DAY)`. Every real Spark caller passing UTC -- the vectorized parquet `DateToTimestampNTZUpdater` and its rebase variants, the row-based `ParquetRowConverter`, the Avro reader, and `Cast` (interpreted + codegen) -- uses the `ZoneOffset.UTC` singleton, so the fast path triggers everywhere it matters.
DateTimeUtils.daysToMicros for ZoneOffset.UTC
|
test first, will update benchmark results later |
…uet.ParquetVectorUpdaterBenchmark (JDK 17, Scala 2.13, split 1 of 1)
| def daysToMicros(days: Int, zoneId: ZoneId): Long = { | ||
| val instant = daysToLocalDate(days).atStartOfDay(zoneId).toInstant | ||
| instantToMicros(instant) | ||
| if (zoneId eq ZoneOffset.UTC) { |
There was a problem hiding this comment.
Just out of curiosity, is there a version of this that takes an arbitrary offset (e.g. in seconds) vs. a nominal zone ID? The distinction hardly ever matters, except when various jurisdictions change their timezone rules. For example, here in British Columbia, we have officially stopped doing DST changes! 😅
There was a problem hiding this comment.
+1, LGTM. Could you update Java21/25 test result together, @LuciferYang ?
|
cc @peter-toth |
|
This PR hasn't updated the all benchmark results yet |
|
Ya, I noticed and commented it too~ |
|
I just marked this as |
ok ~ |
…uet.ParquetVectorUpdaterBenchmark (JDK 21, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 25, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 17, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 17, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 21, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 25, Scala 2.13, split 1 of 1)
What changes were proposed in this pull request?
Add a
ZoneOffset.UTCfast path toDateTimeUtils.daysToMicros(days, zoneId):For UTC the answer is simply
days * MICROS_PER_DAY, so the slow path's three heap allocations (LocalDate,ZonedDateTime,Instant) are wasted.Why are the changes needed?
daysToMicros(days, ZoneOffset.UTC)is on the per-row hot path of the vectorized parquet reader (DateToTimestampNTZUpdaterand the rebase variants), the row-based parquet converter (ParquetRowConverter), the Avro reader (AvroDeserializer), and DATE -> TIMESTAMPCast(interpreted + codegen). All of them pass theZoneOffset.UTCsingleton, so the reference-equality fast path triggers everywhere it matters.ParquetVectorUpdaterBenchmarkresults will be regenerated via GitHub Actions.Does this PR introduce any user-facing change?
No, pure optimization. Behavior is preserved for every input in Spark's valid
DateTyperange. Both paths useMath.multiplyExactinternally and overflow at the same|days| ~= 107Mboundary with the sameArithmeticException, far outside any reachable input.How was this patch tested?
New
DateTimeUtilsSuitecontract test pins down the UTC fast path:Etc/GMT) path for a representative set ofdays(zero, positive, negative,+/-maxSafeDays).days * MICROS_PER_DAYdirectly, so divergence in some future JDK is caught.ArithmeticExceptionon overflow.Was this patch authored or co-authored using generative AI tooling?
No