fix: AE-2322: network volume size bounds, forbid extra args#290
fix: AE-2322: network volume size bounds, forbid extra args#290
Conversation
| id: Optional[str] = Field(default=None) | ||
| name: Optional[str] = None | ||
| size: Optional[int] = Field(default=100, gt=0) # Size in GB | ||
| size: Optional[int] = Field(default=100, gt=0, le=4096) # Size in GB |
There was a problem hiding this comment.
default should probably be lower, like 20GB? 10GB? Also I believe the minimum is 10GB.
There was a problem hiding this comment.
yeah it's 10gb, updated. 20gb seems sensible for a default
|
📝 Documentation updates detected! New suggestion: Document valid size range for network volumes Tip: Configure how Promptless handles changelogs in Agent Settings 📋 |
KAJdev
left a comment
There was a problem hiding this comment.
just fix the assert in the test looks good
There was a problem hiding this comment.
Pull request overview
Adds stricter validation for NetworkVolume configuration so invalid volume definitions fail fast at model-construction/config time.
Changes:
- Enforce
NetworkVolumename is not empty/whitespace-only and forbid unknown fields via Pydantic config. - Introduce size bounds (min 10GB, max 4096GB) and change the default size to 20GB.
- Update/unit-extend tests and manifest expectations to reflect the new validation and defaults.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit/test_p2_remaining_gaps.py |
Updates a gap-test expectation to reflect new name validation behavior and size constraint wording. |
tests/unit/test_p2_gaps.py |
Updates VOL-* tests for new min-size constraint and new default size. |
tests/unit/resources/test_network_volume.py |
Adds focused unit tests for empty/whitespace names, size bounds, and unknown-field rejection. |
tests/unit/cli/commands/build_utils/test_manifest.py |
Updates manifest extraction test to assert the new default volume size. |
src/runpod_flash/core/resources/network_volume.py |
Implements extra="forbid", adds name validator, and applies min/max size constraints plus default size change. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
runpod-Henrik
left a comment
There was a problem hiding this comment.
PR #290 — fix: AE-2322: network volume size bounds, forbid extra args
Repo: runpod/flash
Severity: MEDIUM
Key findings: Volume creation will fail silently after this PR ships — the volume gets created in RunPod but the SDK throws an error, leaving users with orphaned paid resources.
1. Size bounds and default change
The 10 GB minimum, 4 TB ceiling, and 20 GB default are correct and well-motivated. The unit test coverage for the new bounds is thorough.
2. Bug: volume creation broken end-to-end (high confidence)
The unit tests all pass with mocked API responses, but live volume creation will fail. When RunPod actually creates a network volume, the API response includes fields the SDK doesn't declare (createdAt, podCount, isPublic, mountPath, etc.). This PR makes the SDK reject any unknown fields as invalid — so the SDK throws a validation error immediately after the volume is created.
The failure scenario: user runs flash deploy, the network volume gets created and charged in RunPod, the SDK crashes, and the deploy appears to have failed. The user retries, potentially creating multiple orphaned volumes. There is no E2E test covering live volume creation — the unit tests mock the API response and only return the four fields the SDK expects, which is why this isn't caught.
This needs a live volume creation test before shipping, or the model needs to be changed to tolerate extra fields from the API response.
3. Issue: orphaned volumes cannot be removed via the CLI
flash undeploy does not delete network volumes — the undeploy path for NetworkVolume raises NotImplementedError and tells users to delete via the RunPod UI or API. A user who hits the creation bug (section 2) and retries will accumulate orphaned paid volumes with no CLI path to clean them up. They have to find and delete them manually in the RunPod console. This makes the section 2 failure significantly worse than it appears.
4. No documentation on network volume lifecycle
There is no user-facing documentation covering: how to find and reuse an existing volume by ID, what happens to a volume across redeploys, how to delete a volume (the CLI cannot do it — users must use the RunPod UI or API, which is not mentioned anywhere), or what to do when a volume is left orphaned by a failed deploy. The SDK reference shows the constructor syntax but not the operational lifecycle. This PR adds size constraints and a new default — none of those appear in any doc either.
Given that volumes are persistent paid resources and this PR makes the failure mode worse (creation succeeds but SDK crashes), documenting the lifecycle — especially deletion and recovery — should ship alongside this fix.
Nit: users with existing endpoints will see unexpected redeployment
Any endpoint currently configured with the old 100 GB volume default will appear as "drifted" on next flash deploy and trigger a full redeploy. Not a bug introduced by this PR, but users will be surprised. Worth a note in the release.
Verdict
NEEDS WORK — the size validation is correct but the change breaks live volume creation, and orphaned volumes from failed deploys cannot be cleaned up from the CLI. Needs a live test or a code fix before merging.
🤖 Reviewed by Henrik's AI-Powered Bug Finder
runpod-Henrik
left a comment
There was a problem hiding this comment.
Correction to prior review
Section 2 (Bug: volume creation broken end-to-end) was wrong and should be disregarded.
I claimed the RunPod API returns extra fields (createdAt, podCount, etc.) that would cause extra="forbid" to reject the response. I verified this by calling GET /v1/networkvolumes directly — the API returns only {id, name, dataCenterId, size}, which matches the model exactly. extra="forbid" is safe. There is no breakage here.
The Nit about unexpected redeployment was also wrong — the default size stays 100 GB.
Sections 3 and 4 from the original review remain valid (CLI cannot delete volumes, lifecycle not documented). Updated verdict: PASS WITH NITS.
🤖 Reviewed by Henrik's AI-Powered Bug Finder
|
📝 Documentation updates detected! Updated existing suggestion: Document valid size range for network volumes Tip: Assign suggestions to team members in the Promptless dashboard to claim work 👥 |
validation for network volume config: