Skip to content

openvmm_entry: Replat ttrpc server on VmController#3259

Open
jstarks wants to merge 3 commits intomicrosoft:mainfrom
jstarks:ttrpc_controller
Open

openvmm_entry: Replat ttrpc server on VmController#3259
jstarks wants to merge 3 commits intomicrosoft:mainfrom
jstarks:ttrpc_controller

Conversation

@jstarks
Copy link
Copy Markdown
Member

@jstarks jstarks commented Apr 13, 2026

Refactor the ttrpc/gRPC server so that after CreateVm, VM lifecycle operations are routed through VmController instead of being managed inline by VmService. This eliminates duplicated worker lifecycle management between the ttrpc and REPL paths.

VmController fields diag_inspector and paravisor_diag are now Option<_> so the ttrpc path can construct a VmController without paravisor-specific resources. VmService no longer owns a WorkerHandle directly; instead it holds a Sender and receives VmControllerEvents for guest halt notifications.

Teardown/Quit send VmControllerRpc::Quit to the controller, which handles worker stop+join+mesh shutdown. Inspect routes through the controller via VmControllerRpc::Inspect. WaitVm stores a response sender that is completed when GuestHalt or WorkerStopped events arrive from the controller. Pause/Resume/SCSI/NIC operations remain as direct VmRpc calls, unchanged.

Refactor the ttrpc/gRPC server so that after CreateVm, VM lifecycle
operations are routed through VmController instead of being managed
inline by VmService. This eliminates duplicated worker lifecycle
management between the ttrpc and REPL paths.

VmController fields diag_inspector and paravisor_diag are now
Option<_> so the ttrpc path can construct a VmController without
paravisor-specific resources. VmService no longer owns a WorkerHandle
directly; instead it holds a Sender<VmControllerRpc> and receives
VmControllerEvents for guest halt notifications.

Teardown/Quit send VmControllerRpc::Quit to the controller, which
handles worker stop+join+mesh shutdown. Inspect routes through the
controller via VmControllerRpc::Inspect. WaitVm stores a response
sender that is completed when GuestHalt or WorkerStopped events
arrive from the controller. Pause/Resume/SCSI/NIC operations remain
as direct VmRpc calls, unchanged.
Copilot AI review requested due to automatic review settings April 13, 2026 07:20
@jstarks jstarks changed the title Replat ttrpc server on VmController openvmm_entry: Replat ttrpc server on VmController Apr 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the ttrpc/gRPC VM service to route VM lifecycle operations through VmController (aligning the ttrpc and REPL entry paths), so worker lifecycle + mesh shutdown are owned in one place instead of being duplicated in VmService.

Changes:

  • VmService now spawns a VmController task after CreateVm, sends VmControllerRpc (Quit/Inspect), and consumes VmControllerEvent (halt/worker stop).
  • VmController is made constructible for the ttrpc path by making diag_inspector and paravisor_diag optional.
  • Removes parking_lot usage from the ttrpc path by eliminating the prior Mutex<Option<Receiver<_>>> wait pattern.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
openvmm/openvmm_entry/src/vm_controller.rs Makes diag/paravisor diagnostics optional and updates inspect + servicing paths accordingly.
openvmm/openvmm_entry/src/ttrpc/mod.rs Replats ttrpc VM lifecycle management onto VmController, replacing inline worker-handle ownership and wait logic.
openvmm/openvmm_entry/src/lib.rs Updates REPL path VmController construction to wrap new optional fields in Some(...).
openvmm/openvmm_entry/Cargo.toml Drops parking_lot dependency now that it’s no longer needed by the ttrpc module.
Cargo.lock Removes parking_lot from the resolved dependency graph for openvmm_entry.

Comment on lines 376 to +381
vmservice::Vm::WaitVm((), response) => {
let r = self.wait_vm(ctx, vm);
self.start_rpc(response, r);
if self.wait_vm_response.is_some() {
response.send(Err(grpc_error(anyhow!("wait VM already in flight"))));
} else {
self.wait_vm_response = Some(response);
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WaitVm no longer ties the stored response sender to the request's CancelContext. If the client cancels/disconnects before the VM halts, wait_vm_response stays occupied until a controller event arrives, which blocks subsequent WaitVm calls and can leak state. Consider tracking the CancelContext alongside the sender and clearing/completing it on cancellation/deadline (or reintroduce the previous per-request wait future).

Copilot uses AI. Check for mistakes.
if let Some(event) = event {
self.handle_controller_event(event);
} else {
tracing::debug!("controller event channel closed");
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the controller event channel is closed, this branch only logs but leaves vm_controller_events intact. Since recv.next().await will then return None immediately, the main loop can spin and burn CPU logging "controller event channel closed" repeatedly. On None, drop/clear self.vm_controller_events (and possibly controller_task) to stop polling a closed stream.

Suggested change
tracing::debug!("controller event channel closed");
tracing::debug!("controller event channel closed");
self.vm_controller_events = None;

Copilot uses AI. Check for mistakes.
Ok(())
})
if let Some(response) = self.wait_vm_response.take() {
response.send(Err(grpc_error(anyhow!("VM worker stopped"))));
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On VmControllerEvent::WorkerStopped, the error string from the controller (error: Option<String>) is logged but not propagated to a pending WaitVm response, which makes debugging harder. Consider including the controller-provided error string in the returned Status message (and distinguish normal stop vs failure if applicable).

Suggested change
response.send(Err(grpc_error(anyhow!("VM worker stopped"))));
let status = if let Some(err) = error {
grpc_error(anyhow!("VM worker stopped: {}", err))
} else {
grpc_error(anyhow!("VM worker stopped"))
};
response.send(Err(status));

Copilot uses AI. Check for mistakes.
InspectTarget::Paravisor => {
self.diag_inspector.inspect_mut(req);
if let Some(inspector) = &mut self.diag_inspector {
inspector.inspect_mut(req);
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InspectTarget::Paravisor does nothing when diag_inspector is None, leaving the inspect::Request unevaluated. In the inspect framework, unevaluated nodes surface as "Unresolved" errors and can force callers to wait for timeouts. If paravisor inspection is unavailable, explicitly respond (e.g., req.ignore() or return a small node explaining diagnostics aren't configured).

Suggested change
inspector.inspect_mut(req);
inspector.inspect_mut(req);
} else {
req.ignore();

Copilot uses AI. Check for mistakes.
@jstarks jstarks marked this pull request as ready for review April 14, 2026 00:01
@jstarks jstarks requested a review from a team as a code owner April 14, 2026 00:01
Copilot AI review requested due to automatic review settings April 14, 2026 00:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

Comment on lines 399 to +404
vmservice::Vm::WaitVm((), response) => {
let r = self.wait_vm(ctx, vm);
self.start_rpc(response, r);
if self.wait_vm_response.is_some() {
response.send(Err(grpc_error(anyhow!("wait VM already in flight"))));
} else {
self.wait_vm_response = Some((ctx.clone(), response));
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WaitVm can miss a GuestHalt/WorkerStopped event if the controller event arrives before the client issues WaitVm. The main loop continuously drains vm_controller_events, and handle_controller_event drops GuestHalt when no wait_vm_response is set, so a later WaitVm may never complete (despite the VM already being halted). Consider buffering the last terminal controller state (halted/stopped/teardown) and completing WaitVm immediately when called, or only polling controller events while a WaitVm is in flight (while still recording terminal events for later).

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants