test: stabilize nightly stream and TLS specs#2994
Closed
He-Pin wants to merge 3 commits into
Closed
Conversation
…eTest for JDK 25 virtualized nightly Motivation: JDK 25 nightly runs abort the stream TCK with `Failed to stop [InputStreamSourceTest] within [40000 milliseconds]` after the CoordinatedShutdown `actor-system-terminate` phase times out at its default 10 seconds. The dump shows two `flow-X-0-take` ActorGraphInterpreter children stuck mid-termination under the StreamSupervisor. The test feeds a CPU-busy `InputStream` whose `read()` always returns a fresh byte without blocking or yielding, so each `onPull` runs up to `chunkSize` synchronous `read()` calls. The nightly JDK 25 build forces `pekko.test.stream-dispatcher.fork-join-executor.virtualize=on`, which is the very dispatcher the test pins via `ActorAttributes.dispatcher(...)`. On a virtualized dispatcher this combination slows cancellation propagation through `take(elements)` enough that the 10 second phase timeout fires before the lingering flow actors finish terminating, even though the outer `ActorSystemLifecycle.shutdownTimeout` is already scaled to 40 seconds by `pekko.test.timefactor`. Modification: Override `additionalConfig` in `InputStreamSourceTest` to extend `pekko.coordinated-shutdown.phases.actor-system-terminate.timeout` to 30 seconds, mirroring the pattern already used in `MixedProtocolClusterSpec` for the same JDK 25 virtualized failure mode. The override layers on top of `PekkoPublisherVerification.additionalConfig` via `withFallback` so existing buffer-size settings are preserved. Result: The phase has enough headroom to drain in-flight cancellation traffic on virtualized dispatchers before the outer shutdown await fires. Verified locally on JDK 25 (Oracle OpenJDK 25.0.2) with the same virtualize/timefactor flags as `nightly-builds.yml`: `sbt "project stream-tests-tck" "testOnly org.apache.pekko.stream.tck.InputStreamSourceTest"` reports 26 passing / 0 failing / 12 canceled (TCK optional multi-subscriber specs). References: nightly-builds.yml `jdk-nightly-build` matrix entry javaVersion=25
Motivation: Recent nightly builds fail repeatedly on JDK 21/25 in stream TCK and TLS rotating-key tests. Modification: Make InputStreamSourceTest model the TCK element count directly by emitting one byte per ByteString without relying on take(elements) cancellation. Allow RotatingKeysSSLEngineProviderSpec.contact to ignore retry ActorIdentity messages while waiting for the echo response. Result: The affected specs no longer fail when delayed Identify responses or JDK 25 virtualized test-stream-dispatcher scheduling occur. Tests: - scalafmt --mode diff-ref=origin/main - scalafmt --list --mode diff-ref=origin/main - git diff --check - sbt with JDK 25 nightly-style virtualized dispatcher flags: stream-tests-tck / Test / testOnly org.apache.pekko.stream.tck.InputStreamSourceTest; remote / Test / testOnly org.apache.pekko.remote.artery.tcp.ssl.RotatingProviderWithChangingKeysSpec - sbt with JDK 21 nightly-style virtualized dispatcher flags: remote / Test / testOnly org.apache.pekko.remote.artery.tcp.ssl.RotatingProviderWithChangingKeysSpec - sbt with JDK 25 nightly-style virtualized test-stream-dispatcher flags: stream-tests-tck / Test / testOnly org.apache.pekko.stream.tck.InputStreamSourceTest References: None - nightly-builds.yml failure analysis
Motivation: Recent nightly builds repeatedly time out in TlsGraphStageEdgeCasesSpec under JDK 25 while running early-cancellation TLS edge cases. Modification: Have collectExactly materialize a KillSwitch and watch stream termination, then shut the stream down after collecting the expected bytes so repeated early-cancellation tests do not leave previous TLS materializations draining in the same actor system. Result: The TLS edge case suite no longer accumulates lingering TlsGraphStage/headOptionSink actors during repeated early-cancellation checks. Tests: - scalafmt --mode diff-ref=origin/main - scalafmt --list --mode diff-ref=origin/main - git diff --check - sbt with JDK 25 nightly-style virtualized test-stream-dispatcher flags: stream-tests / Test / testOnly org.apache.pekko.stream.io.TlsGraphStageEdgeCasesSpec References: None - nightly-builds.yml failure analysis
This was referenced May 28, 2026
Member
Author
|
Superseded by smaller single-commit PRs for easier review:
Keeping the fixes split so each failure domain can be reviewed independently. |
Member
Author
|
Closing as superseded by the split single-commit PRs listed above. |
He-Pin
added a commit
that referenced
this pull request
May 28, 2026
Motivation: JDK 25 nightly builds time out in repeated TlsGraphStageEdgeCasesSpec early-cancellation scenarios because earlier materializations can keep draining after the expected bytes have been collected. Modification: Materialize collectExactly with a KillSwitch and watch stream termination, then shut down and await the stream in finally after the expected bytes are collected. Result: Repeated TLS edge-case checks do not leave prior materializations running in the same actor system. Tests: - JDK 25 nightly-style virtualized stream-dispatcher flags: stream-tests / Test / testOnly org.apache.pekko.stream.io.TlsGraphStageEdgeCasesSpec - scalafmt --mode diff-ref=origin/main --quiet - scalafmt --list --mode diff-ref=origin/main - git diff --check References: Refs #2994
He-Pin
added a commit
that referenced
this pull request
May 29, 2026
Motivation: RotatingKeysSSLEngineProviderSpec can receive a delayed ActorIdentity from an earlier Identify attempt after the target actor ref has already been resolved. Nightly retry policy still fails builds when that stale identity arrives before the ping reply. Modification: Wait for the expected ping with fishForMessage and ignore late ActorIdentity messages while keeping the same per-attempt timeout budget. Result: The test accepts the intended ping reply without being failed by harmless delayed Identify responses. Tests: - JDK 21: remote / Test / testOnly org.apache.pekko.remote.artery.tcp.ssl.RotatingProviderWithChangingKeysSpec - JDK 25: remote / Test / testOnly org.apache.pekko.remote.artery.tcp.ssl.RotatingProviderWithChangingKeysSpec (fails on existing JDK 25 EKU certificate validation behavior) - scalafmt --mode diff-ref=origin/main --quiet - scalafmt --list --mode diff-ref=origin/main - git diff --check References: Refs #2994
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.