Skip to content

[bug] Lora bug fixes + port collision fix#1625

Open
hao-aaron wants to merge 2 commits intoNovaSky-AI:mainfrom
hao-aaron:lora-bug-fixes
Open

[bug] Lora bug fixes + port collision fix#1625
hao-aaron wants to merge 2 commits intoNovaSky-AI:mainfrom
hao-aaron:lora-bug-fixes

Conversation

@hao-aaron
Copy link
Copy Markdown
Collaborator

Bug fixes found in #1616

While testing out the tinker CI, found a port collision bug.

async def _run_server(self) -> None:
"""Internal method to run the HTTP server."""
# Release the port reservation right before vLLM rebinds.
if self._port_reservation is not None:
self._port_reservation.close() # ← step A
self._port_reservation = None
...
sock_addr = (self._cli_args.host, self._cli_args.port)
sock = create_server_socket(sock_addr)
There exists a window of time where we free port so vllm can take it, but other actors can also. Fixed this by allocating a specific, non-overlapping port range for each actor.

Additionally, changed the disk reloading workaround #1606 created here to more closely mirror the upstream change vllm-project/vllm#41482, adding a new custom endpoint with the load_inplace override.

hao-aaron added 2 commits May 6, 2026 00:29
x
Signed-off-by: hao-aaron <ahao@anyscale.com>
x
Signed-off-by: hao-aaron <ahao@anyscale.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a port striding mechanism to prevent port conflicts between inference servers and implements a custom vLLM endpoint for loading LoRA adapters to bypass specific upstream bugs. The review identifies a potential AttributeError when accessing the LoRA ID counter, suggests adding a boundary check for the maximum valid port number to avoid overflow errors, and recommends maintaining a consistent response format for control plane operations.

Comment on lines +384 to +388
lora_int_id = (
models.lora_requests[lora_name].lora_int_id
if lora_name in models.lora_requests
else models.lora_id_counter.inc(1)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The lora_id_counter in vLLM's OpenAIServingModels is typically an iterator (like itertools.count or a custom Counter class) and does not provide an inc() method. Attempting to call models.lora_id_counter.inc(1) will likely result in an AttributeError. In standard vLLM implementations, the next ID should be obtained using next(models.lora_id_counter) or models.lora_id_counter.generate() depending on the specific vLLM version being used.

            async with models.lora_resolver_lock[lora_name]:
                lora_int_id = (
                    models.lora_requests[lora_name].lora_int_id
                    if lora_name in models.lora_requests
                    else next(models.lora_id_counter)
                )

end_port = start_port + SERVER_PORT_STRIDE
sock: socket.socket | None = None
while True:
while port < end_port:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The loop condition port < end_port does not account for the maximum valid port number (65535). If start_port + SERVER_PORT_STRIDE exceeds 65535, socket.bind will eventually raise an OverflowError when port reaches 65536. Since the except OSError block does not catch OverflowError, the function will crash with a traceback instead of raising the descriptive RuntimeError defined at the end of the function. It is safer to include a check for the valid port range in the loop condition.

Suggested change
while port < end_port:
while port < end_port and port <= 65535:

body = await resp.json()
raise_for_status(resp, body)
return server_url, {"status": resp.status, "body": await resp.text()}
return server_url, await resp.json()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The return value of _load_on_server has been changed to return the JSON body directly, whereas other control plane methods in this class (e.g., pause, resume, sleep, reset_prefix_cache) return a standardized dictionary containing both the HTTP status and the body. This inconsistency in the response format across the RemoteInferenceClient API could lead to issues for callers or monitoring tools that expect a uniform structure for all control plane operations.

Suggested change
return server_url, await resp.json()
return server_url, {"status": resp.status, "body": await resp.json()}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant