[CELEBORN-2275][CIP-14] Add C++ merge-write and Java-read hybrid integration test#3619
[CELEBORN-2275][CIP-14] Add C++ merge-write and Java-read hybrid integration test#3619afterincomparableyum wants to merge 2 commits intoapache:mainfrom
Conversation
|
Will open this PR once #3611 gets merged. |
|
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
…gration test Add a new C++ test client that exercises the mergeData/pushMergedData write path and validates data integrity by reading back from the Java ShuffleClient. This complements the existing pushData-based hybrid test by covering the merge write path. - Add DataSumWithMergeWriterClient.cpp and its CMake build target - Add CppMergeWriteJavaReadTest entry points for NONE, LZ4, and ZSTD compression codecs - Add runCppMergeWriteJavaRead to JavaCppHybridReadWriteTestBase - Update cpp_integration CI workflow to run the new tests
f768b68 to
72ae412
Compare
|
Update PR to rebase off of main. |
|
Ping @SteNicholas @RexXiong for review |
There was a problem hiding this comment.
Pull request overview
This PR adds a new C++ merge-write + Java-read hybrid integration test to validate the C++ mergeData/pushMergedData write path by reading back data via the Java ShuffleClient, and wires it into CI for multiple compression codecs.
Changes:
- Add a new C++ integration test client (
cppDataSumWithMergeWriterClient) that writes via merge-write and emits per-partition checksums. - Add Scala entry points to run the new merge-write hybrid test for NONE/LZ4/ZSTD.
- Update the
cpp_integrationGitHub Actions workflow to execute the new merge-write hybrid tests.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| worker/src/test/scala/org/apache/celeborn/service/deploy/cluster/JavaCppHybridReadWriteTestBase.scala | Adds the runCppMergeWriteJavaRead test driver and data verification logic. |
| worker/src/test/scala/org/apache/celeborn/service/deploy/cluster/CppMergeWriteJavaReadTestWithNONE.scala | Adds NONE codec entry point for the new hybrid test. |
| worker/src/test/scala/org/apache/celeborn/service/deploy/cluster/CppMergeWriteJavaReadTestWithLZ4.scala | Adds LZ4 codec entry point for the new hybrid test. |
| worker/src/test/scala/org/apache/celeborn/service/deploy/cluster/CppMergeWriteJavaReadTestWithZSTD.scala | Adds ZSTD codec entry point for the new hybrid test. |
| cpp/celeborn/tests/DataSumWithMergeWriterClient.cpp | Implements the C++ merge-write test client used by the hybrid test. |
| cpp/celeborn/tests/CMakeLists.txt | Adds CMake targets for building the new C++ merge-writer test client. |
| .github/workflows/cpp_integration.yml | Runs the new merge-write hybrid tests in CI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
⏺ Review: [CELEBORN-2275][CIP-14] Add C++ merge-write and Java-read hybrid integration test PR: #3619 | Author: (CIP-14 contributor) | Change: +268 / -1 across 8 files What it does Adds integration tests for the C++ mergeData/pushMergedData write path, validated by reading data back from the Java ShuffleClient. Covers three compression codecs: NONE, LZ4, ZSTD. Mirrors the existing pushData-based hybrid test pattern. Files changed:
Detailed Analysis C++ writer (DataSumWithMergeWriterClient.cpp)
Java reader (JavaCppHybridReadWriteTestBase.scala :: runCppMergeWriteJavaRead)
Issues
Verdict Approve with minor nits. This is a well-structured PR that closely follows existing patterns. The C++ write format and Java read parser are compatible — I verified the checksum logic end-to-end and found no correctness bugs. All issues are minor (style, resource handling, EOF newlines) and non-blocking. |
What changes were proposed in this pull request?
Add a new C++ test client that for the mergeData/pushMergedData write path and validates data integrity by reading back from the Java ShuffleClient. This complements the existing pushData based hybrid test by covering the merge write path.
Why are the changes needed?
This is to add integration tests for #3611.
Does this PR resolve a correctness bug?
No.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Tested through running unit tests and compiling locally.