Skip to content

nvme_driver: don't lazily restore interrupts (#3220)#3235

Open
mattkur wants to merge 1 commit intomicrosoft:release/1.7.2511from
mattkur:cherrypick/release/1.7.2511/pr-3220
Open

nvme_driver: don't lazily restore interrupts (#3220)#3235
mattkur wants to merge 1 commit intomicrosoft:release/1.7.2511from
mattkur:cherrypick/release/1.7.2511/pr-3220

Conversation

@mattkur
Copy link
Copy Markdown
Contributor

@mattkur mattkur commented Apr 9, 2026

Clean cherry pick of PR #3220

Works around a device-side issue: some devices cannot handle the case where interrupts are reprogrammed out of IV order.

Works around a device-side issue: some devices cannot handle the case
where interrupts are reprogrammed out of IV order.

(cherry picked from commit 958b8a4)
@mattkur mattkur requested review from a team as code owners April 9, 2026 18:40
Copilot AI review requested due to automatic review settings April 9, 2026 18:40
@github-actions github-actions bot added the release_1.7.2511 Targets the release/1.7.2511 branch. label Apr 9, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts NVMe driver servicing-restore behavior to avoid device bugs triggered when MSI-X interrupts are reprogrammed out of interrupt-vector (IV) order, by making the default restore path eagerly restore all I/O queues in IV-sorted order.

Changes:

  • Default to an eager restore strategy that restores all I/O queues, sorted by interrupt vector, to preserve ascending IV programming order.
  • Add allow_lazy_restore: Option<bool> to saved state to optionally fall back to the prior eager+proto (lazy) split restore behavior.
  • Update Underhill NVMe manager test helpers to populate the new saved-state field.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
vm/devices/storage/disk_nvme/nvme_driver/src/driver.rs Implements IV-sorted eager restoration of all I/O queues; adds saved-state flag to optionally enable the old lazy behavior.
openhcl/underhill_core/src/nvme_manager/save_restore_helpers.rs Updates test-state construction to include the new allow_lazy_restore field.

Comment on lines +1072 to +1122
// Eager restore path: restore ALL queues sorted by interrupt
// vector for ordered VPci allocation (MSI-X ordering workaround).
//
// Devnote: Safety of inline new_self_drained(): This loop is fully
// synchronous (no .await). Although IoQueue::restore() spawns
// queue handler tasks, they don't poll until the async runtime
// yields — which happens only after .collect() completes. So all
// new_self_drained() and new_draining() calls finish before any
// handler can fire the drain-complete signal. If this loop is ever
// refactored to be async, the waiters for empty queues must be
// pre-created (as done in the lazy path above).
let mut sorted_io: Vec<_> = saved_state.worker_data.io.iter().collect();
sorted_io.sort_by_key(|q| q.iv);

worker.io = sorted_io
.into_iter()
.flat_map(|q| -> Result<IoQueue<D>, anyhow::Error> {
let qid = q.queue_data.qid;
let cpu = q.cpu;
tracing::info!(qid, cpu, iv = q.iv, ?pci_id, "restoring queue");
max_seen_qid = max_seen_qid.max(qid);
let interrupt =
worker.device.map_interrupt(q.iv, q.cpu).with_context(|| {
format!(
"failed to map interrupt for {}, cpu {}, iv {}",
pci_id, q.cpu, q.iv
)
})?;
tracing::info!(qid, cpu, ?pci_id, "restoring queue: search for mem block");
let mem_block = restored_memory
.iter()
.find(|mem| {
mem.len() == q.queue_data.mem_len
&& q.queue_data.base_pfn == mem.pfns()[0]
})
.expect("unable to find restored mem block")
.to_owned();
tracing::info!(qid, cpu, ?pci_id, "restoring queue: restore IoQueue");
let q = IoQueue::restore(
driver.clone(),
interrupt,
registers.clone(),
mem_block,
&pci_id,
q,
bounce_buffer,
if q.queue_data.handler_data.pending_cmds.commands.is_empty() {
drain_after_restore_template.new_self_drained()
} else {
drain_after_restore_template.new_draining()
},
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the eager-restore path, empty queues get DrainAfterRestoreBuilder::new_self_drained() created inline while IoQueue::restore() is spawning queue-handler tasks. The devnote assumes those tasks won’t run until the async runtime yields, but VmTaskDriver explicitly allows spawned tasks to run on other executors/threads immediately. That reintroduces the missed-signal race you previously avoided by pre-creating waiters: a non-empty queue could drain and signal before later empty queues register their waiters, leaving them stuck in SelfDrained forever. Consider pre-creating the DrainAfterRestore handles for all queues before restoring/spawning any of them, or use the builder’s new_for_new_queue() (double-check) logic for empty queues so late registration can’t miss a completed drain.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release_1.7.2511 Targets the release/1.7.2511 branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants