From 323a89539ea7ab93f01aae47f148359b2ec58e46 Mon Sep 17 00:00:00 2001 From: "promptless[bot]" <179508745+promptless[bot]@users.noreply.github.com> Date: Tue, 24 Mar 2026 17:00:07 +0000 Subject: [PATCH 1/2] Document min_cuda_version parameter for Flash GPU endpoints --- flash/configuration/parameters.mdx | 31 ++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/flash/configuration/parameters.mdx b/flash/configuration/parameters.mdx index b28d4133..ce42fd8c 100644 --- a/flash/configuration/parameters.mdx +++ b/flash/configuration/parameters.mdx @@ -29,6 +29,7 @@ This page provides a complete reference for all parameters available on the `End | `scaler_type` | `ServerlessScalerType` | Scaling strategy | auto | | `scaler_value` | `int` | Scaling threshold | `4` | | `template` | `PodTemplate` | Pod template overrides | `None` | +| `min_cuda_version` | `str` | Minimum CUDA version for GPU host selection | `"12.8"` (GPU) / `None` (CPU) | ## Parameter details @@ -537,6 +538,35 @@ template = PodTemplate( For simple environment variables, use the `env` parameter on `Endpoint` instead of `PodTemplate.env`. +### min_cuda_version + +**Type**: `str` +**Default**: `"12.8"` for GPU endpoints, `None` for CPU endpoints + +Specifies the minimum CUDA driver version required on the host machine. GPU endpoints default to `"12.8"` to ensure workers run on hosts with recent CUDA drivers. + +```python +from runpod_flash import Endpoint, GpuType + +# Use the default (12.8) +@Endpoint(name="ml-inference", gpu=GpuType.NVIDIA_A100_80GB_PCIe) +async def infer(data): ... + +# Override to allow older hosts +@Endpoint( + name="legacy-compatible", + gpu=GpuType.NVIDIA_A100_80GB_PCIe, + min_cuda_version="12.4" +) +async def infer_legacy(data): ... +``` + +This parameter has no effect on CPU endpoints. + + +Valid CUDA versions include: `"11.1"`, `"11.4"`, `"11.7"`, `"11.8"`, `"12.0"`, `"12.1"`, `"12.2"`, `"12.3"`, `"12.4"`, `"12.6"`, `"12.8"`. Invalid values raise a `ValueError`. + + ## EndpointJob When using `Endpoint(id=...)` or `Endpoint(image=...)`, the `.run()` method returns an `EndpointJob` object for async operations: @@ -576,6 +606,7 @@ These changes restart all workers: - Storage (`volume`) - Datacenter (`datacenter`) - Flashboot setting (`flashboot`) +- CUDA version requirement (`min_cuda_version`) Workers are temporarily unavailable during recreation (typically 30-90 seconds). From 8faec14b1178e6fcae2a9bb1a24b11564ff449aa Mon Sep 17 00:00:00 2001 From: "promptless[bot]" <179508745+promptless[bot]@users.noreply.github.com> Date: Wed, 25 Mar 2026 17:57:22 +0000 Subject: [PATCH 2/2] Sync documentation updates --- flash/configuration/parameters.mdx | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/flash/configuration/parameters.mdx b/flash/configuration/parameters.mdx index ce42fd8c..9e15767f 100644 --- a/flash/configuration/parameters.mdx +++ b/flash/configuration/parameters.mdx @@ -29,7 +29,7 @@ This page provides a complete reference for all parameters available on the `End | `scaler_type` | `ServerlessScalerType` | Scaling strategy | auto | | `scaler_value` | `int` | Scaling threshold | `4` | | `template` | `PodTemplate` | Pod template overrides | `None` | -| `min_cuda_version` | `str` | Minimum CUDA version for GPU host selection | `"12.8"` (GPU) / `None` (CPU) | +| `min_cuda_version` | `str` or `CudaVersion` | Minimum CUDA version for GPU host selection | `"12.8"` (GPU) / `None` (CPU) | ## Parameter details @@ -540,31 +540,39 @@ For simple environment variables, use the `env` parameter on `Endpoint` instead ### min_cuda_version -**Type**: `str` +**Type**: `str` or `CudaVersion` **Default**: `"12.8"` for GPU endpoints, `None` for CPU endpoints Specifies the minimum CUDA driver version required on the host machine. GPU endpoints default to `"12.8"` to ensure workers run on hosts with recent CUDA drivers. ```python -from runpod_flash import Endpoint, GpuType +from runpod_flash import Endpoint, GpuType, CudaVersion # Use the default (12.8) @Endpoint(name="ml-inference", gpu=GpuType.NVIDIA_A100_80GB_PCIe) async def infer(data): ... -# Override to allow older hosts +# Override with string value @Endpoint( name="legacy-compatible", gpu=GpuType.NVIDIA_A100_80GB_PCIe, min_cuda_version="12.4" ) async def infer_legacy(data): ... + +# Override with CudaVersion enum +@Endpoint( + name="cuda-12", + gpu=GpuType.NVIDIA_A100_80GB_PCIe, + min_cuda_version=CudaVersion.V12_0 +) +async def infer_cuda12(data): ... ``` This parameter has no effect on CPU endpoints. -Valid CUDA versions include: `"11.1"`, `"11.4"`, `"11.7"`, `"11.8"`, `"12.0"`, `"12.1"`, `"12.2"`, `"12.3"`, `"12.4"`, `"12.6"`, `"12.8"`. Invalid values raise a `ValueError`. +Valid CUDA versions: `CudaVersion.V11_1`, `V11_4`, `V11_7`, `V11_8`, `V12_0`, `V12_1`, `V12_2`, `V12_3`, `V12_4`, `V12_6`, `V12_8` (or equivalent strings like `"12.4"`). Invalid values raise a `ValueError`. ## EndpointJob