-
Notifications
You must be signed in to change notification settings - Fork 32
docs: Document multi-datacenter support for Flash endpoints #582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5a64b94
c119c88
357f5ab
d2506fa
42bcc00
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,8 +19,8 @@ This page provides a complete reference for all parameters available on the `End | |
| | `dependencies` | `list[str]` | Python packages to install | `None` | | ||
| | `system_dependencies` | `list[str]` | System packages to install (apt) | `None` | | ||
| | `accelerate_downloads` | `bool` | Enable download acceleration | `True` | | ||
| | `volume` | `NetworkVolume` | Network volume for persistent storage | `None` | | ||
| | `datacenter` | `DataCenter` | Preferred datacenter | `EU_RO_1` | | ||
| | `volume` | `NetworkVolume` or list | Network volume(s) for persistent storage | `None` | | ||
| | `datacenter` | `DataCenter`, list, or `None` | Datacenter(s) for deployment | `None` (all DCs) | | ||
| | `env` | `dict[str, str]` | Environment variables | `None` | | ||
| | `gpu_count` | `int` | GPUs per worker | `1` | | ||
| | `execution_timeout_ms` | `int` | Max execution time in milliseconds | `0` (no limit) | | ||
|
|
@@ -208,19 +208,21 @@ async def process(data): ... | |
|
|
||
| ### volume | ||
|
|
||
| **Type**: `NetworkVolume` | ||
| **Type**: `NetworkVolume` or `list[NetworkVolume]` | ||
| **Default**: `None` | ||
|
|
||
| Attaches a network volume for persistent storage. Volumes are mounted at `/runpod-volume/`. Flash uses the volume `name` to find an existing volume or create a new one. | ||
| Attaches network volume(s) for persistent storage. Volumes are mounted at `/runpod-volume/`. Flash uses the volume `name` to find an existing volume or create a new one. Each volume is tied to a specific datacenter. | ||
|
|
||
| ```python | ||
| from runpod_flash import Endpoint, GpuGroup, NetworkVolume | ||
| from runpod_flash import Endpoint, GpuGroup, DataCenter, NetworkVolume | ||
|
|
||
| vol = NetworkVolume(name="model-cache") # Finds existing or creates new | ||
| # Single volume in a specific datacenter | ||
| vol = NetworkVolume(name="model-cache", size=100, datacenter=DataCenter.US_GA_2) | ||
|
|
||
| @Endpoint( | ||
| name="model-server", | ||
| gpu=GpuGroup.ANY, | ||
| datacenter=DataCenter.US_GA_2, | ||
| volume=vol | ||
| ) | ||
| async def serve(data): | ||
|
|
@@ -229,6 +231,30 @@ async def serve(data): | |
| ... | ||
| ``` | ||
|
|
||
| For multi-datacenter deployments, pass a list of volumes (one per datacenter): | ||
|
|
||
| ```python | ||
| from runpod_flash import Endpoint, GpuGroup, DataCenter, NetworkVolume | ||
|
|
||
| volumes = [ | ||
| NetworkVolume(name="models-us", size=100, datacenter=DataCenter.US_GA_2), | ||
| NetworkVolume(name="models-eu", size=100, datacenter=DataCenter.EU_RO_1), | ||
| ] | ||
|
|
||
| @Endpoint( | ||
| name="global-server", | ||
| gpu=GpuGroup.ANY, | ||
| datacenter=[DataCenter.US_GA_2, DataCenter.EU_RO_1], | ||
| volume=volumes | ||
| ) | ||
| async def serve(data): | ||
| ... | ||
| ``` | ||
|
|
||
| <Warning> | ||
| Only one network volume is allowed per datacenter. If you specify multiple volumes in the same datacenter, deployment will fail. | ||
| </Warning> | ||
|
|
||
| **Use cases**: | ||
| - Share large models across workers | ||
| - Persist data between runs | ||
|
|
@@ -238,24 +264,65 @@ See [Storage](/flash/configuration/storage) for setup instructions. | |
|
|
||
| ### datacenter | ||
|
|
||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Citation: The |
||
| **Type**: `DataCenter` | ||
| **Default**: `DataCenter.EU_RO_1` | ||
| **Type**: `DataCenter`, `list[DataCenter]`, `str`, `list[str]`, or `None` | ||
| **Default**: `None` (all available datacenters) | ||
|
|
||
| Preferred datacenter for worker deployment. | ||
| Specifies the datacenter(s) for worker deployment. When set to `None`, the endpoint is available in all datacenters. | ||
|
|
||
| ```python | ||
| from runpod_flash import Endpoint, DataCenter | ||
| from runpod_flash import Endpoint, GpuGroup, DataCenter | ||
|
|
||
| # Deploy to all available datacenters (default) | ||
| @Endpoint(name="global", gpu=GpuGroup.ANY) | ||
| async def process(data): ... | ||
|
|
||
| # Deploy to a single datacenter | ||
| @Endpoint( | ||
| name="us-workers", | ||
| gpu=GpuGroup.ANY, | ||
| datacenter=DataCenter.US_GA_2 | ||
| ) | ||
| async def process(data): ... | ||
|
|
||
| # Deploy to multiple datacenters | ||
| @Endpoint( | ||
| name="eu-workers", | ||
| name="multi-region", | ||
| gpu=GpuGroup.ANY, | ||
| datacenter=DataCenter.EU_RO_1 | ||
| datacenter=[DataCenter.US_GA_2, DataCenter.EU_RO_1] | ||
| ) | ||
| async def process(data): ... | ||
|
|
||
| # String DC IDs also work | ||
| @Endpoint( | ||
| name="us-workers", | ||
| gpu=GpuGroup.ANY, | ||
| datacenter="US-GA-2" | ||
| ) | ||
| async def process(data): ... | ||
| ``` | ||
|
|
||
| **Available datacenters**: | ||
|
|
||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Citation: The available datacenter list comes from the updated |
||
| | Value | Location | | ||
| |-------|----------| | ||
| | `DataCenter.US_CA_2` | US - California | | ||
| | `DataCenter.US_GA_2` | US - Georgia | | ||
| | `DataCenter.US_IL_1` | US - Illinois | | ||
| | `DataCenter.US_KS_2` | US - Kansas | | ||
| | `DataCenter.US_MD_1` | US - Maryland | | ||
| | `DataCenter.US_MO_1` | US - Missouri | | ||
| | `DataCenter.US_MO_2` | US - Missouri | | ||
| | `DataCenter.US_NC_1` | US - North Carolina | | ||
| | `DataCenter.US_NC_2` | US - North Carolina | | ||
| | `DataCenter.US_NE_1` | US - Nebraska | | ||
| | `DataCenter.US_WA_1` | US - Washington | | ||
| | `DataCenter.EU_CZ_1` | Europe - Czech Republic | | ||
| | `DataCenter.EU_RO_1` | Europe - Romania | | ||
| | `DataCenter.EUR_IS_1` | Europe - Iceland | | ||
| | `DataCenter.EUR_NO_1` | Europe - Norway | | ||
|
|
||
| <Note> | ||
| Flash Serverless deployments are currently restricted to `EU-RO-1`. | ||
| CPU endpoints are restricted to `CPU_DATACENTERS`, which currently only includes `EU_RO_1`. | ||
| </Note> | ||
|
|
||
| ### env | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,27 +43,61 @@ If you specify a custom size that exceeds the instance limit, deployment will fa | |
|
|
||
| ## Network volumes | ||
|
|
||
| Network volumes provide persistent storage that survives worker restarts. Use this to share data between endpoint functions with the same network volume attached, or to persist data between runs. | ||
| Network volumes provide persistent storage that survives worker restarts. Each volume is tied to a specific datacenter. Use volumes to share data between endpoint functions or to persist data between runs. | ||
|
|
||
| ### Attaching network volumes | ||
|
|
||
| Attach a network volume using the `volume` parameter. Flash uses the volume `name` to find an existing volume or create a new one: | ||
| Attach a network volume using the `volume` parameter. Flash uses the volume `name` to find an existing volume or create a new one. Specify the `datacenter` parameter to control where the volume is created: | ||
|
|
||
| ```python | ||
| from runpod_flash import Endpoint, GpuType, NetworkVolume | ||
| from runpod_flash import Endpoint, GpuType, DataCenter, NetworkVolume | ||
|
|
||
| vol = NetworkVolume(name="model-cache") # Finds existing or creates new | ||
| vol = NetworkVolume(name="model-cache", size=100, datacenter=DataCenter.US_GA_2) | ||
|
|
||
| @Endpoint( | ||
| name="persistent-storage", | ||
| gpu=GpuType.NVIDIA_A100_80GB_PCIe, | ||
| datacenter=DataCenter.US_GA_2, | ||
| volume=vol | ||
| ) | ||
| async def process(data: dict) -> dict: | ||
| # Access files at /runpod-volume/ | ||
| ... | ||
| ``` | ||
|
|
||
| You can also reference an existing volume by ID: | ||
|
|
||
| ```python | ||
| vol = NetworkVolume(id="vol_abc123") | ||
| ``` | ||
|
|
||
| ### Multi-datacenter volumes | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Citation: Multi-datacenter volume examples based on PR #266's |
||
|
|
||
| For endpoints deployed across multiple datacenters, pass a list of volumes (one per datacenter): | ||
|
|
||
| ```python | ||
| from runpod_flash import Endpoint, GpuType, DataCenter, NetworkVolume | ||
|
|
||
| volumes = [ | ||
| NetworkVolume(name="models-us", size=100, datacenter=DataCenter.US_GA_2), | ||
| NetworkVolume(name="models-eu", size=100, datacenter=DataCenter.EU_RO_1), | ||
| ] | ||
|
|
||
| @Endpoint( | ||
| name="global-inference", | ||
| gpu=GpuType.NVIDIA_A100_80GB_PCIe, | ||
| datacenter=[DataCenter.US_GA_2, DataCenter.EU_RO_1], | ||
| volume=volumes | ||
| ) | ||
| async def process(data: dict) -> dict: | ||
| # Workers in each region access their local volume at /runpod-volume/ | ||
| ... | ||
| ``` | ||
|
|
||
| <Warning> | ||
| Only one network volume is allowed per datacenter. If you specify multiple volumes in the same datacenter, deployment will fail. | ||
| </Warning> | ||
|
|
||
| ### Accessing network volume files | ||
|
|
||
| Network volumes mount at `/runpod-volume/` and can be accessed like a regular filesystem: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -104,7 +104,7 @@ flash --help | |
| ## Limitations | ||
|
|
||
| - Flash is currently only available for macOS and Linux. Windows support is in development. | ||
| - Serverless deployments using Flash are currently restricted to the `EU-RO-1` datacenter. | ||
| - CPU endpoints are restricted to the `EU-RO-1` datacenter. GPU endpoints can deploy to [multiple datacenters](/flash/configuration/parameters#datacenter). | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Citation: The EU-RO-1 restriction removal and multi-datacenter support for GPU endpoints comes from PR #266. CPU endpoints remain restricted to |
||
| - Flash can rapidly scale workers across multiple endpoints, and you may hit your maximum worker threshold quickly. Contact [Runpod support](https://www.runpod.io/contact) to increase your account's capacity if needed. | ||
|
|
||
| ## Tutorials | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Citation: Multi-volume support documented in PR #266's
docs/Flash_Deploy_Guide.mdanddocs/Flash_SDK_Reference.md, with implementation insrc/runpod_flash/core/resources/serverless.pyaddingnetworkVolumesfield.View source