Skip to content

[kubernetes] Fix CiliumNetworkPolicy endpointSelector for multi-node RWX volumes#2227

Merged
kvaps merged 1 commit intocozystack:mainfrom
mattia-eleuteri:fix/cnp-endpoint-selector-rwx
Mar 16, 2026
Merged

[kubernetes] Fix CiliumNetworkPolicy endpointSelector for multi-node RWX volumes#2227
kvaps merged 1 commit intocozystack:mainfrom
mattia-eleuteri:fix/cnp-endpoint-selector-rwx

Conversation

@mattia-eleuteri
Copy link
Contributor

@mattia-eleuteri mattia-eleuteri commented Mar 16, 2026

What this PR does

When an NFS-backed RWX volume is published to multiple VMs, the CiliumNetworkPolicy endpointSelector.matchLabels only included the first VM. Subsequent ControllerPublishVolume calls added ownerReferences but never broadened the selector, causing Cilium to block NFS egress — mounts hang on all nodes except the first.

This PR switches from matchLabels to matchExpressions (operator: In) so the selector can list multiple VM names, and rebuilds it whenever ownerReferences are added or removed.

Release note

[kubernetes] Fixed CiliumNetworkPolicy endpointSelector not being updated when NFS-backed RWX volumes are published to multiple VMs, which caused NFS mounts to hang on all nodes except the first.

Summary by CodeRabbit

  • New Features

    • KubeVirt CSI driver now supports selecting and targeting multiple virtual machines for volume publishing.
  • Improvements

    • Network policy targets are rebuilt automatically when VM ownership references change, improving correctness and lifecycle handling in multi-VM scenarios.

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Mar 16, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1f137db7-b177-4412-af9e-45bc3bd6fb15

📥 Commits

Reviewing files that changed from the base of the PR and between 9a78c81 and cc5ec0b.

📒 Files selected for processing (1)
  • packages/apps/kubernetes/images/kubevirt-csi-driver/controller.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/apps/kubernetes/images/kubevirt-csi-driver/controller.go

📝 Walkthrough

Walkthrough

ControllerPublishVolume now builds endpointSelectors dynamically from VM names extracted from ownerReferences. OwnerReference additions/removals trigger rebuilding the selector and updating the CiliumNetworkPolicy via new helper functions buildEndpointSelector and vmNamesFromOwnerRefs.

Changes

Cohort / File(s) Summary
Dynamic Endpoint Selector
packages/apps/kubernetes/images/kubevirt-csi-driver/controller.go
Adds buildEndpointSelector() and vmNamesFromOwnerRefs() helpers. Replaces static selector with dynamic builder in ControllerPublishVolume. Rebuilds and updates CiliumNetworkPolicy when ownerReferences are added or removed.

Sequence Diagram(s)

sequenceDiagram
    participant Controller as Controller
    participant Builder as buildEndpointSelector
    participant K8sAPI as Kubernetes API (CiliumNetworkPolicy)
    participant VM as VM ownerReferences

    Controller->>VM: read ownerReferences
    Controller->>Builder: vmNamesFromOwnerRefs(ownerReferences)
    Builder-->>Controller: endpointSelector (matchExpressions for VM names)
    Controller->>K8sAPI: create/update CiliumNetworkPolicy with selector
    note right of K8sAPI: On ownerReference change,\nController repeats flow to rebuild selector
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through ownerRefs, counted each name,
Built selectors that bounced and played the game,
When names were added or quietly gone,
I rebuilt the gates and carried on,
A rabbit that stitches policies all the same!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: fixing CiliumNetworkPolicy endpointSelector to support multi-node RWX volumes, which aligns with the PR's core objective of enabling multiple VMs in the endpoint selector.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can suggest fixes for GitHub Check annotations.

Configure the reviews.tools.github-checks setting to adjust the time to wait for GitHub Checks to complete.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a bug in the kubevirt-csi-driver that prevented NFS-backed RWX volumes from being correctly mounted on multiple VMs due to an improperly configured CiliumNetworkPolicy. The changes ensure that the endpoint selector is dynamically updated to include all relevant VMs, allowing for successful NFS egress and mounts.

Highlights

  • Bug Fix: Addresses an issue where NFS-backed RWX volumes published to multiple VMs only included the first VM in the CiliumNetworkPolicy endpoint selector, causing mount failures for subsequent VMs.
  • Solution: Replaces matchLabels with matchExpressions in the endpoint selector to allow listing multiple VM names. The selector is rebuilt whenever ownerReferences are added or removed.
  • Code Improvements: Introduces buildEndpointSelector() and vmNamesFromOwnerRefs() helper functions for consistent selector construction.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Activity
  • The pull request introduces helper functions for building endpoint selectors and extracting VM names from owner references.
  • The endpoint selector is now rebuilt whenever owner references are added or removed, ensuring it stays up-to-date.
  • The fix involves switching from matchLabels to matchExpressions for the endpoint selector.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist
Copy link
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@mattia-eleuteri mattia-eleuteri force-pushed the fix/cnp-endpoint-selector-rwx branch from 9a78c81 to 81230e4 Compare March 16, 2026 13:43
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/apps/kubernetes/images/kubevirt-csi-driver/controller.go (1)

428-445: ⚠️ Potential issue | 🟠 Major

Always reconcile endpointSelector, even when ownerRef already exists.

At Line 431, the early return skips selector rebuild. This leaves previously broken CNPs (already containing multiple ownerRefs but stale single-VM selector) unreconciled on retries for the same VM.

🔧 Proposed fix
 func (w *WrappedControllerService) addCNPOwnerReference(ctx context.Context, namespace, cnpName string, ownerRef map[string]interface{}) error {
 	return retry.RetryOnConflict(retry.DefaultRetry, func() error {
 		existing, err := w.dynamicClient.Resource(ciliumNetworkPolicyGVR).Namespace(namespace).Get(ctx, cnpName, metav1.GetOptions{})
 		if err != nil {
 			return status.Errorf(codes.Internal, "failed to get CiliumNetworkPolicy %s: %v", cnpName, err)
 		}

 		ownerRefs, _, _ := unstructured.NestedSlice(existing.Object, "metadata", "ownerReferences")
 		uid, _, _ := unstructured.NestedString(ownerRef, "uid")
+		alreadyPresent := false
 		for _, ref := range ownerRefs {
 			if refMap, ok := ref.(map[string]interface{}); ok {
 				if refMap["uid"] == uid {
-					return nil // already present
+					alreadyPresent = true
+					break
 				}
 			}
 		}

-		ownerRefs = append(ownerRefs, ownerRef)
-		if err := unstructured.SetNestedSlice(existing.Object, ownerRefs, "metadata", "ownerReferences"); err != nil {
-			return status.Errorf(codes.Internal, "failed to set ownerReferences: %v", err)
+		if !alreadyPresent {
+			ownerRefs = append(ownerRefs, ownerRef)
+			if err := unstructured.SetNestedSlice(existing.Object, ownerRefs, "metadata", "ownerReferences"); err != nil {
+				return status.Errorf(codes.Internal, "failed to set ownerReferences: %v", err)
+			}
 		}

 		// Rebuild endpointSelector to include all VMs
 		selector := buildEndpointSelector(vmNamesFromOwnerRefs(ownerRefs))
 		if err := unstructured.SetNestedField(existing.Object, selector, "spec", "endpointSelector"); err != nil {
 			return status.Errorf(codes.Internal, "failed to set endpointSelector: %v", err)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/apps/kubernetes/images/kubevirt-csi-driver/controller.go` around
lines 428 - 445, The early return when detecting an existing ownerRef prevents
rebuilding the endpointSelector; change the logic in the ownerRefs loop so that
you do not return immediately when refMap["uid"] == uid but instead skip
appending the duplicate ownerRef while continuing to rebuild the selector and
update the object: keep the ownerRefs slice deduping behavior (do not append if
already present), then always call
buildEndpointSelector(vmNamesFromOwnerRefs(ownerRefs)) and
unstructured.SetNestedField(existing.Object, selector, "spec",
"endpointSelector") (and the existing SetNestedSlice call only when ownerRefs
changed) so stale single-VM selectors are reconciled even when the ownerRef was
already present.
🧹 Nitpick comments (1)
packages/apps/kubernetes/images/kubevirt-csi-driver/controller.go (1)

525-536: Consider deduplicating VM names extracted from ownerReferences.

If owner refs ever contain duplicate VM names (e.g., stale refs with different UIDs), values can include duplicates. Deduping here keeps selectors minimal and deterministic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/apps/kubernetes/images/kubevirt-csi-driver/controller.go` around
lines 525 - 536, The vmNamesFromOwnerRefs function may return duplicate VM names
when ownerReferences contain repeated entries; change vmNamesFromOwnerRefs to
deduplicate before returning by using a seen set (e.g., map[string]struct{})
while iterating ownerRefs and only append names not yet seen, so the returned
slice from vmNamesFromOwnerRefs contains unique, deterministic VM names for
selector construction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/apps/kubernetes/images/kubevirt-csi-driver/controller.go`:
- Around line 428-445: The early return when detecting an existing ownerRef
prevents rebuilding the endpointSelector; change the logic in the ownerRefs loop
so that you do not return immediately when refMap["uid"] == uid but instead skip
appending the duplicate ownerRef while continuing to rebuild the selector and
update the object: keep the ownerRefs slice deduping behavior (do not append if
already present), then always call
buildEndpointSelector(vmNamesFromOwnerRefs(ownerRefs)) and
unstructured.SetNestedField(existing.Object, selector, "spec",
"endpointSelector") (and the existing SetNestedSlice call only when ownerRefs
changed) so stale single-VM selectors are reconciled even when the ownerRef was
already present.

---

Nitpick comments:
In `@packages/apps/kubernetes/images/kubevirt-csi-driver/controller.go`:
- Around line 525-536: The vmNamesFromOwnerRefs function may return duplicate VM
names when ownerReferences contain repeated entries; change vmNamesFromOwnerRefs
to deduplicate before returning by using a seen set (e.g., map[string]struct{})
while iterating ownerRefs and only append names not yet seen, so the returned
slice from vmNamesFromOwnerRefs contains unique, deterministic VM names for
selector construction.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d414e48a-54a4-446c-9941-6ec98ef269c3

📥 Commits

Reviewing files that changed from the base of the PR and between 38d58a7 and 9a78c81.

📒 Files selected for processing (1)
  • packages/apps/kubernetes/images/kubevirt-csi-driver/controller.go

@mattia-eleuteri mattia-eleuteri changed the title fix(kubernetes): update CNP endpointSelector for multi-node RWX volumes [kubernetes] Fix CiliumNetworkPolicy endpointSelector for multi-node RWX volumes Mar 16, 2026
…RWX volumes

When an NFS-backed RWX volume is published to multiple VMs, the
CiliumNetworkPolicy egress rule only allowed traffic from the first VM.
The endpointSelector.matchLabels was set once on creation and never
broadened, causing NFS mounts to hang on all nodes except the first.

Switch from matchLabels to matchExpressions (operator: In) so the
selector can list multiple VM names. Rebuild the selector whenever
ownerReferences are added or removed.

Signed-off-by: mattia-eleuteri <mattia@hidora.io>
@mattia-eleuteri mattia-eleuteri force-pushed the fix/cnp-endpoint-selector-rwx branch from 81230e4 to cc5ec0b Compare March 16, 2026 13:45
Copy link
Member

@kvaps kvaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 16, 2026
@kvaps kvaps added backport Should change be backported on previus release backport-previous labels Mar 16, 2026
@kvaps kvaps merged commit 9fb9354 into cozystack:main Mar 16, 2026
9 checks passed
@github-actions
Copy link

@github-actions
Copy link

kvaps added a commit that referenced this pull request Mar 17, 2026
…elector for multi-node RWX volumes (#2228)

# Description
Backport of #2227 to `release-1.0`.
kvaps added a commit that referenced this pull request Mar 17, 2026
…elector for multi-node RWX volumes (#2229)

# Description
Backport of #2227 to `release-1.1`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport Should change be backported on previus release backport-previous bug Something isn't working lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants