Android: Closeable lifecycle, error reporting, thread safety, and test coverage (#19229)#19229
Android: Closeable lifecycle, error reporting, thread safety, and test coverage (#19229)#19229psiddh wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19229
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 4 Unrelated FailuresAs of commit 85cc5e4 with merge base fe98297 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Consolidates several Android API improvements across the ExecuTorch Java/Kotlin wrappers and JNI layer, focusing on exception-based error reporting, explicit lifecycle management (Closeable), and basic thread-safety guarantees, plus expanded instrumentation test coverage.
Changes:
- Standardize synchronous error handling to throw exceptions (vs returning status codes / silent logging) across
Module,LlmModule, training/ASR wrappers, and JNI glue. - Add/solidify lifecycle management via
Closeable(close()/idempotent destroy patterns) and serialize access to non-thread-safe native state inLlmModuleviaReentrantLock. - Update JNI registrations and Java native method naming (
*Native), and expand Android instrumentation tests to cover lifecycle and API behavior.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| extension/benchmark/android/benchmark/app/src/main/java/org/pytorch/minibench/ModelRunner.java | Handle Module.loadMethod() throwing by capturing an error code and early-returning safely; ensure destroy() in finally. |
| extension/android/jni/jni_layer_llama.cpp | Add constructor exception-to-Java conversion; improve load() error reporting; rename registered natives (generateNative, loadNative, resetContextNative). |
| extension/android/jni/jni_layer.cpp | Wrap native Module construction in exception mapping; throw on unsupported input EValue type codes; rename registered natives (etdumpNative, getMethodsNative). |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/training/TrainingModule.java | Implement Closeable; replace silent failures with IllegalStateException on use-after-destroy. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/training/SGD.java | Throw IllegalStateException (vs generic RuntimeException) when optimizer is destroyed. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModule.java | Implement Closeable; add locking + destroyed checks; convert many status-returning APIs to void + exception; rename native methods to *Native. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/asr/AsrModule.kt | Throw ExecutorchRuntimeException with error code on creation/transcription failures. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/Module.java | Implement Closeable; make loadMethod() throw; add locked wrappers around additional APIs; rename native methods to *Native. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/ExecutorchRuntimeException.java | Add error-code documentation, ALREADY_LOADED, improved message prefix, and cause-chaining constructor. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/ExecuTorchRuntime.java | Tighten file-path validation and switch to IllegalArgumentException. |
| extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/ModuleInstrumentationTest.kt | Un-ignore and fix tests by providing required dummy input; add new API/lifecycle coverage. |
| extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmModuleInstrumentationTest.kt | Update tests for exception-based load; add close() lifecycle tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a96bd6b to
6ffdd62
Compare
787239e to
9f8afe1
Compare
There was a problem hiding this comment.
Pull request overview
This PR consolidates several previously reverted Android improvements into one atomic update, standardizing ExecuTorch Android APIs around exception-based error reporting, explicit Closeable lifecycles, and improved thread-safety—along with significant new/updated instrumentation test coverage.
Changes:
- Shift Android APIs away from status-code returns/silent failures to consistent exception throwing (Java/Kotlin + JNI).
- Add/align
Closeablelifecycle semantics (try-with-resources) and introduce locking around non-thread-safe native state (notably forLlmModule). - Update JNI registrations to match renamed
*Nativemethods and expand Android instrumentation tests (including new lifecycle tests).
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| extension/benchmark/android/benchmark/app/src/main/java/org/pytorch/minibench/ModelRunner.java | Adapts benchmark load-path to exception-based Module.loadMethod() and ensures destroy on success path. |
| extension/benchmark/android/benchmark/app/src/main/java/org/pytorch/minibench/LlmModelRunner.java | Adapts load-path to exception-based LlmModule.load() (but generate path still needs updates). |
| extension/android/jni/jni_layer_llama.cpp | Adds constructor error surfacing, improves load error reporting, and updates native method registrations (*Native). |
| extension/android/jni/jni_layer.cpp | Adds constructor error surfacing, throws on unsupported EValue input type, and updates native registrations (etdumpNative, getMethodsNative). |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/training/TrainingModule.java | Implements Closeable and replaces silent failures with exceptions. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/training/SGD.java | Uses IllegalStateException for destroyed optimizer usage. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/llm/LlmModule.java | Implements Closeable, adds locking, converts sync APIs to throw-on-error wrappers, and renames JNI methods to *Native. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/extension/asr/AsrModule.kt | Switches to ExecutorchRuntimeException for native creation/transcription failures. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/Module.java | Implements Closeable, adds throw-on-error wrappers and lock-guarded public methods, and renames JNI methods to *Native. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/ExecutorchRuntimeException.java | Adds richer docs, new error code constant, improved message format, and a cause-chaining constructor. |
| extension/android/executorch_android/src/main/java/org/pytorch/executorch/ExecuTorchRuntime.java | Makes file-path validation stricter and consistently IllegalArgumentException-based. |
| extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/ModuleInstrumentationTest.kt | Un-ignores/fixes Module tests and adds broader API/lifecycle coverage (load modes, log buffer, etdump, destroyed-state). |
| extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmModuleInstrumentationTest.kt | Updates to new throwing APIs and adds lifecycle tests (use-after-close, idempotent close). |
Comments suppressed due to low confidence (1)
extension/benchmark/android/benchmark/app/src/main/java/org/pytorch/minibench/LlmModelRunner.java:102
LlmModule.generate(...)now throws on failure, but the MESSAGE_GENERATE branch doesn't catch exceptions. An exception here will crash the HandlerThread and can preventonGenerationStopped()from running / results from being recorded. Wrap the generate call in try/catch (ideally extracting an error code fromExecutorchRuntimeException) and route the failure to the callback (e.g., callonGenerationStopped()and/or add an explicit error callback).
mLlmModelRunner.mCallback.onModelLoaded(status);
} else if (msg.what == MESSAGE_GENERATE) {
mLlmModelRunner.mModule.generate((String) msg.obj, mLlmModelRunner);
mLlmModelRunner.mCallback.onGenerationStopped();
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@psiddh has imported this pull request. If you are a Meta employee, you can view this in D103233465. |
a44e428 to
04727f0
Compare
…t coverage (pytorch#19229) Summary: Combines all previously reverted Android improvement PRs (pytorch#18669, pytorch#19012, pytorch#19028, pytorch#19092, pytorch#19099, pytorch#19124) plus new Module lifecycle tests into a single atomic change. Error reporting: all sync errors throw exceptions instead of returning status codes or logging silently. Module.loadMethod() throws ExecutorchRuntimeException on failure. LlmModule.generate() and load() throw on error. Native methods renamed with "Native" suffix; public wrappers check status and throw. Lifecycle: Module, LlmModule, and TrainingModule implement Closeable for try-with-resources. LlmModule adds ReentrantLock to serialize access to non-thread-safe native state. stop() remains lock-free (C++ atomic flag). TrainingModule replaces Log.e silent failures with IllegalStateException. Error consistency: ExecuTorchRuntime.validateFilePath throws IllegalArgumentException. SGD throws IllegalStateException. AsrModule throws ExecutorchRuntimeException. Cause-chaining constructor added to ExecutorchRuntimeException. JNI safety: Module and LlmModule constructors wrapped in try-catch to surface native initialization failures. LlmModule.load() uses throwExecutorchException with diagnostic detail. All DoNotStrip annotations preserved on JNI-called methods; new DoNotStrip added to renamed private native methods (generateNative, resetContextNative, loadNative). Test Plan: fix 4 Ignored Module tests by providing required input tensor. Add 13 new lifecycle/API coverage tests. Add LlmModule use-after-close and idempotent close tests. This diff updates callers of Module.loadMethod() to use try/catch instead of checking the return code, calling destroy() on failure. Reviewed By: kirklandsign Differential Revision: D103233465 Pulled By: psiddh
9adb92f to
274552b
Compare
|
@psiddh has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103233465. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
extension/benchmark/android/benchmark/app/src/main/java/org/pytorch/minibench/LlmModelRunner.java:103
LlmModule.generate(...)now throws on failure (it no longer returns a status code). In the MESSAGE_GENERATE path, any exception will escapehandleMessage, potentially killing the handler thread and skippingonGenerationStopped(). Wrapgenerate()in try/catch and ensure the callback is notified (and/or propagate an error status) instead of letting the background thread crash.
mLlmModelRunner.mCallback.onModelLoaded(status);
} else if (msg.what == MESSAGE_GENERATE) {
mLlmModelRunner.mModule.generate((String) msg.obj, mLlmModelRunner);
mLlmModelRunner.mCallback.onGenerationStopped();
}
extension/android/executorch_android/src/androidTest/java/org/pytorch/executorch/LlmModuleInstrumentationTest.kt:62
- These tests create a new
LlmModuleinsetUp()for every test case but never close it, so native resources can accumulate across the suite (and potentially cause flakiness/OOM). Add an@Afterthat callsllmModule.close()(or wrap each test body in try/finally) so the module is always released.
llmModule =
LlmModule(getTestFilePath(TEST_FILE_NAME), getTestFilePath(TOKENIZER_FILE_NAME), 0.0f)
}
@Test
@Throws(IOException::class, URISyntaxException::class)
fun testGenerate() {
llmModule.load()
llmModule.generate(TEST_PROMPT, SEQ_LEN, this@LlmModuleInstrumentationTest)
assertEquals(results.size.toLong(), SEQ_LEN.toLong())
assertTrue(tokensPerSecond[tokensPerSecond.size - 1] > 0)
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@claude review this PR and the comments |
|
Claude finished @psiddh's task in 2m 14s —— View job PR Review: Android Closeable lifecycle, error reporting, thread safety, and test coverage
Overall AssessmentThis is a well-structured consolidation of several improvements. The move from status-code returns to exception-based error reporting is the right direction for Android APIs, and the That said, there are a few issues worth addressing — some flagged by Copilot that I agree with, and some additional ones. Issues I'd Recommend Fixing1.
|
…t coverage (pytorch#19229) Summary: Combines all previously reverted Android improvement PRs (pytorch#18669, pytorch#19012, pytorch#19028, pytorch#19092, pytorch#19099, pytorch#19124) plus new Module lifecycle tests into a single atomic change. Error reporting: all sync errors throw exceptions instead of returning status codes or logging silently. Module.loadMethod() throws ExecutorchRuntimeException on failure. LlmModule.generate() and load() throw on error. Native methods renamed with "Native" suffix; public wrappers check status and throw. Lifecycle: Module, LlmModule, and TrainingModule implement Closeable for try-with-resources. LlmModule adds ReentrantLock to serialize access to non-thread-safe native state. stop() remains lock-free (C++ atomic flag). TrainingModule replaces Log.e silent failures with IllegalStateException. Error consistency: ExecuTorchRuntime.validateFilePath throws IllegalArgumentException. SGD throws IllegalStateException. AsrModule throws ExecutorchRuntimeException. Cause-chaining constructor added to ExecutorchRuntimeException. JNI safety: Module and LlmModule constructors wrapped in try-catch to surface native initialization failures. LlmModule.load() uses throwExecutorchException with diagnostic detail. All DoNotStrip annotations preserved on JNI-called methods; new DoNotStrip added to renamed private native methods (generateNative, resetContextNative, loadNative). Test Plan: fix 4 Ignored Module tests by providing required input tensor. Add 13 new lifecycle/API coverage tests. Add LlmModule use-after-close and idempotent close tests. This diff updates callers of Module.loadMethod() to use try/catch instead of checking the return code, calling destroy() on failure. Reviewed By: kirklandsign Differential Revision: D103233465 Pulled By: psiddh
4558d37 to
0415dd1
Compare
…t coverage (pytorch#19229) Summary: Combines all previously reverted Android improvement PRs (pytorch#18669, pytorch#19012, pytorch#19028, pytorch#19092, pytorch#19099, pytorch#19124) plus new Module lifecycle tests into a single atomic change. Error reporting: all sync errors throw exceptions instead of returning status codes or logging silently. Module.loadMethod() throws ExecutorchRuntimeException on failure. LlmModule.generate() and load() throw on error. Native methods renamed with "Native" suffix; public wrappers check status and throw. Lifecycle: Module, LlmModule, and TrainingModule implement Closeable for try-with-resources. LlmModule adds ReentrantLock to serialize access to non-thread-safe native state. stop() remains lock-free (C++ atomic flag) with volatile mDestroyed guard to prevent use-after-close. TrainingModule replaces Log.e silent failures with IllegalStateException. Thread safety: LlmModule adds checkNotReentrant() guard to all lock-acquiring public methods, preventing callbacks from re-entering module methods and corrupting native state. TrainingModule.mDestroyed made volatile for cross-thread visibility. Error consistency: ExecuTorchRuntime.validateFilePath throws IllegalArgumentException. SGD throws IllegalStateException. AsrModule throws ExecutorchRuntimeException. Cause-chaining constructor added to ExecutorchRuntimeException. JNI safety: Module and LlmModule constructors wrapped in try-catch to surface native initialization failures. LlmModule.load() uses throwExecutorchException with diagnostic detail. All DoNotStrip annotations preserved on JNI-called methods. std::move added for etdump_gen unique_ptr in profiling build path. LlmModelRunner generate path wrapped in try-catch to prevent HandlerThread death. Test Plan: fix 4 Ignored Module tests by providing required input tensor. Add 13 new lifecycle/API coverage tests. Add LlmModule use-after-close and idempotent close tests. Add @after tearDown to LlmModuleInstrumentationTest to prevent native resource leaks. Remove dead testMethodMetadata test. Fix testNonPteFile cleanup with assertThrows + finally. This diff updates callers of Module.loadMethod() to use try/catch instead of checking the return code, calling destroy() on failure. Co-authored-by: Claude <noreply@anthropic.com> Reviewed By: kirklandsign Differential Revision: D103233465 Pulled By: psiddh
0415dd1 to
e72eecd
Compare
…t coverage (pytorch#19229) Summary: Combines all previously reverted Android improvement PRs (pytorch#18669, pytorch#19012, pytorch#19028, pytorch#19092, pytorch#19099, pytorch#19124) plus new Module lifecycle tests into a single atomic change. Error reporting: all sync errors throw exceptions instead of returning status codes or logging silently. Module.loadMethod() throws ExecutorchRuntimeException on failure. LlmModule.generate() and load() throw on error. Native methods renamed with "Native" suffix; public wrappers check status and throw. Lifecycle: Module, LlmModule, and TrainingModule implement Closeable for try-with-resources. LlmModule adds ReentrantLock to serialize access to non-thread-safe native state. stop() remains lock-free (C++ atomic flag) with volatile mDestroyed guard to prevent use-after-close. TrainingModule replaces Log.e silent failures with IllegalStateException. Thread safety: LlmModule adds checkNotReentrant() guard to all lock-acquiring public methods, preventing callbacks from re-entering module methods and corrupting native state. TrainingModule.mDestroyed made volatile for cross-thread visibility. Error consistency: ExecuTorchRuntime.validateFilePath throws IllegalArgumentException. SGD throws IllegalStateException. AsrModule throws ExecutorchRuntimeException. Cause-chaining constructor added to ExecutorchRuntimeException. JNI safety: Module and LlmModule constructors wrapped in try-catch to surface native initialization failures. LlmModule.load() uses throwExecutorchException with diagnostic detail. All DoNotStrip annotations preserved on JNI-called methods. std::move added for etdump_gen unique_ptr in profiling build path. LlmModelRunner generate path wrapped in try-catch to prevent HandlerThread death. Test Plan: fix 4 Ignored Module tests by providing required input tensor. Add 13 new lifecycle/API coverage tests. Add LlmModule use-after-close and idempotent close tests. Add @after tearDown to LlmModuleInstrumentationTest to prevent native resource leaks. Remove dead testMethodMetadata test. Fix testNonPteFile cleanup with assertThrows + finally. This diff updates callers of Module.loadMethod() to use try/catch instead of checking the return code, calling destroy() on failure. Co-authored-by: Claude <noreply@anthropic.com> Reviewed By: kirklandsign Differential Revision: D103233465 Pulled By: psiddh
e72eecd to
881a8d5
Compare
…t coverage (pytorch#19229) Summary: Combines all previously reverted Android improvement PRs (pytorch#18669, pytorch#19012, pytorch#19028, pytorch#19092, pytorch#19099, pytorch#19124) plus new Module lifecycle tests into a single atomic change. Error reporting: all sync errors throw exceptions instead of returning status codes or logging silently. Module.loadMethod() throws ExecutorchRuntimeException on failure. LlmModule.generate() and load() throw on error. Native methods renamed with "Native" suffix; public wrappers check status and throw. Lifecycle: Module, LlmModule, and TrainingModule implement Closeable for try-with-resources. LlmModule adds ReentrantLock to serialize access to non-thread-safe native state. stop() remains lock-free (C++ atomic flag) with volatile mDestroyed guard to prevent use-after-close. TrainingModule replaces Log.e silent failures with IllegalStateException. Thread safety: LlmModule adds checkNotReentrant() guard to all lock-acquiring public methods, preventing callbacks from re-entering module methods and corrupting native state. TrainingModule.mDestroyed made volatile for cross-thread visibility. Error consistency: ExecuTorchRuntime.validateFilePath throws IllegalArgumentException. SGD throws IllegalStateException. AsrModule throws ExecutorchRuntimeException. Cause-chaining constructor added to ExecutorchRuntimeException. JNI safety: Module and LlmModule constructors wrapped in try-catch to surface native initialization failures. LlmModule.load() uses throwExecutorchException with diagnostic detail. All DoNotStrip annotations preserved on JNI-called methods. std::move added for etdump_gen unique_ptr in profiling build path. LlmModelRunner generate path wrapped in try-catch to prevent HandlerThread death. Test Plan: fix 4 Ignored Module tests by providing required input tensor. Add 13 new lifecycle/API coverage tests. Add LlmModule use-after-close and idempotent close tests. Add @after tearDown to LlmModuleInstrumentationTest to prevent native resource leaks. Remove dead testMethodMetadata test. Fix testNonPteFile cleanup with assertThrows + finally. This diff updates callers of Module.loadMethod() to use try/catch instead of checking the return code, calling destroy() on failure. Co-authored-by: Claude <noreply@anthropic.com> Reviewed By: kirklandsign Differential Revision: D103233465 Pulled By: psiddh
881a8d5 to
85cc5e4
Compare
Summary:
Combines all previously reverted Android improvement PRs (#18669, #19012, #19028, #19092, #19099, #19124) plus new Module lifecycle tests into a single atomic change.
Error reporting: all sync errors throw exceptions instead of returning status codes or logging silently. Module.loadMethod() throws ExecutorchRuntimeException on failure. LlmModule.generate() and load() throw on error. Native methods renamed with "Native" suffix; public wrappers check status and throw.
Lifecycle: Module, LlmModule, and TrainingModule implement Closeable for try-with-resources. LlmModule adds ReentrantLock to serialize access to non-thread-safe native state. stop() remains lock-free (C++ atomic flag). TrainingModule replaces Log.e silent failures with IllegalStateException.
Error consistency: ExecuTorchRuntime.validateFilePath throws IllegalArgumentException. SGD throws IllegalStateException. AsrModule throws ExecutorchRuntimeException. Cause-chaining constructor added to ExecutorchRuntimeException.
JNI safety: Module and LlmModule constructors wrapped in try-catch to surface native initialization failures. LlmModule.load() uses throwExecutorchException with diagnostic detail. All DoNotStrip annotations preserved on JNI-called methods; new DoNotStrip added to renamed private native methods (generateNative, resetContextNative, loadNative).
Test Plan:
fix 4 Ignored Module tests by providing required input tensor. Add 13 new lifecycle/API coverage tests. Add LlmModule use-after-close and idempotent close tests.
This diff updates callers of Module.loadMethod() to use try/catch instead of checking the return code, calling destroy() on failure.
Reviewed By: kirklandsign