Skip to content

[tx] Tinker CI + bugfixes#1616

Open
hao-aaron wants to merge 7 commits intoNovaSky-AI:mainfrom
hao-aaron:tinker-ci
Open

[tx] Tinker CI + bugfixes#1616
hao-aaron wants to merge 7 commits intoNovaSky-AI:mainfrom
hao-aaron:tinker-ci

Conversation

@hao-aaron
Copy link
Copy Markdown
Collaborator

@hao-aaron hao-aaron commented May 4, 2026

This PR introduces a tinker ci, closing #1553. It uses tinker_cookbook.recipes.math_rl.train. Currently the REWARD_MIN_VALUE is set to 0, but will be tightened once we have a couple of runs. Added both colocated and fully async tests.

https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k_tinker_ci?nw=nwuserahao
https://wandb.ai/sky-posttraining-uc-berkeley/gsm8k_tinker_fully_async_ci?nw=nwuserahao

While testing out the tinker CI, found a port collision bug.

async def _run_server(self) -> None:
    """Internal method to run the HTTP server."""
    # Release the port reservation right before vLLM rebinds.
    if self._port_reservation is not None:
        self._port_reservation.close()                          # ← step A
        self._port_reservation = None
    ...
    sock_addr = (self._cli_args.host, self._cli_args.port)
    sock = create_server_socket(sock_addr)

There exists a window of time where we free port so vllm can take it, but other actors can also. Fixed this by allocating a specific, non-overlapping port range for each actor.

Additionally, changed the disk reloading workaround #1606 created here to more closely mirror the upstream change vllm-project/vllm#41482, adding a new custom endpoint with the load_inplace override.

hao-aaron added 6 commits May 4, 2026 17:12
x
Signed-off-by: hao-aaron <ahao@anyscale.com>
x
Signed-off-by: hao-aaron <ahao@anyscale.com>
x
Signed-off-by: hao-aaron <ahao@anyscale.com>
x
Signed-off-by: hao-aaron <ahao@anyscale.com>
x
Signed-off-by: hao-aaron <ahao@anyscale.com>
@hao-aaron hao-aaron changed the title [WIP] Tinker CI [WIP] Tinker CI + bugfixes May 4, 2026
@hao-aaron hao-aaron changed the title [WIP] Tinker CI + bugfixes [tx] Tinker CI + bugfixes May 4, 2026
@hao-aaron hao-aaron marked this pull request as ready for review May 4, 2026 23:36
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a custom LoRA loading endpoint /skyrl/v1/load_lora_adapter to bypass upstream vLLM bugs and implements a port striding mechanism (SERVER_PORT_STRIDE) to prevent port conflicts between inference servers. It also adds CI configuration and E2E test scripts for the Tinker backend. Feedback includes adding a check for LoRA support in the new endpoint to avoid potential attribute errors and ensuring consistent return formats in the inference client.

)

models = request.app.state.openai_serving_models
async with models.lora_resolver_lock[lora_name]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The _skyrl_load_lora_adapter endpoint assumes that LoRA is enabled on the server. If the server is started without LoRA support (e.g., --enable-lora is not passed to vLLM), models.lora_requests and models.lora_resolver_lock may not be initialized, leading to an AttributeError or TypeError. A check should be added to ensure LoRA is enabled before accessing these attributes.

            if not hasattr(models, "lora_requests") or models.lora_requests is None:
                raise HTTPException(
                    status_code=400,
                    detail="LoRA is not enabled on this server.",
                )

            async with models.lora_resolver_lock[lora_name]:

body = await resp.json()
raise_for_status(resp, body)
return server_url, {"status": resp.status, "body": await resp.text()}
return server_url, await resp.json()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The return value of _load_on_server is inconsistent with other control plane methods in this class (e.g., _call_server at line 857), which return a dictionary containing both the HTTP status and the response body. This inconsistency may cause issues for callers expecting the standard {url: {"status": ..., "body": ...}} format.

Suggested change
return server_url, await resp.json()
return server_url, {"status": resp.status, "body": await resp.json()}

# matching the convention in gsm8k_colocate.sh.
REWARD_MIN_VALUE=0.0

BACKEND_CONFIG='{"trainer.placement.colocate_all": true, "trainer.placement.policy_num_gpus_per_node": 4, "trainer.micro_forward_batch_size_per_gpu": 8, "trainer.micro_train_batch_size_per_gpu": 8, "generator.inference_engine.num_engines": 4, "generator.inference_engine.tensor_parallel_size": 1, "generator.inference_engine.backend": "vllm", "generator.inference_engine.run_engines_locally": true, "generator.inference_engine.weight_sync_backend": "nccl", "generator.inference_engine.async_engine": true, "generator.inference_engine.gpu_memory_utilization": 0.8, "generator.batched": true}'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if the default tinker test is non-colocated, 2 GPUs for trainer and 2 for inference.

Copy link
Copy Markdown
Member

@SumanthRH SumanthRH May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually can you just add a separate non-colocated E2E test?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

x
Signed-off-by: hao-aaron <ahao@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants