CNTRLPLANE-2207: feat(install): self-manage webhook certs instead of relying on service-ca#8071
CNTRLPLANE-2207: feat(install): self-manage webhook certs instead of relying on service-ca#8071enxebre wants to merge 3 commits intoopenshift:mainfrom
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@enxebre: This pull request references CNTRLPLANE-1200 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe installer now generates initial webhook CA and serving TLS Secrets when any of EnableDefaultingWebhook, EnableConversionWebhook, EnableValidatingWebhook, or EnableAuditLogPersistence is enabled by calling GenerateInitialWebhookCerts and appending the produced Secrets and CA bundle to install manifests; CRD conversion webhook clientConfig.CABundle is set from that CA bundle. OpenShift-specific annotations for serving-cert/CA bundle injection were removed from install assets. A new WebhookCertReconciler controller was added and wired at runtime (when CertDir != ""), which ensures the self-signed CA and serving cert Secrets exist and patches hypershift.openshift.io CRDs and the Mutating/ValidatingWebhookConfiguration clientConfig.CABundle fields. Sequence DiagramsequenceDiagram
participant Installer as Installation
participant KubeAPI as Kubernetes API
participant CRDAPI as CRD API
participant Controller as WebhookCertReconciler
participant WebhookAPI as Webhook Config API
Installer->>Controller: GenerateInitialWebhookCerts(namespace, service)
Controller->>KubeAPI: Create/Update CA Secret & Serving TLS Secret
KubeAPI-->>Controller: Secrets persisted, return CA bundle
Controller-->>Installer: Return CA bundle bytes
Installer->>CRDAPI: setupCRDs(with webhookCABundle)
CRDAPI-->>Installer: CRDs created/updated
Note over Controller,KubeAPI: Runtime controller active when CertDir != ""
KubeAPI->>Controller: Secret change event (webhook-serving-ca / manager-serving-cert)
Controller->>KubeAPI: Reconcile CA and serving cert Secrets
KubeAPI-->>Controller: Secrets reconciled
Controller->>CRDAPI: List hypershift.openshift.io CRDs
CRDAPI-->>Controller: CRD list
Controller->>CRDAPI: Patch spec.conversion.webhook.clientConfig.caBundle
CRDAPI-->>Controller: Patched
Controller->>WebhookAPI: Get Mutating/ValidatingWebhookConfiguration hypershift.openshift.io
WebhookAPI-->>Controller: Webhook configs
Controller->>WebhookAPI: Patch webhook.clientConfig.caBundle
WebhookAPI-->>Controller: Patched
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: enxebre The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@enxebre: This pull request references CNTRLPLANE-1200 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
82b075e to
933c722
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmd/install/install.go (1)
856-879:⚠️ Potential issue | 🟠 MajorBug: annotation check is inverted, will clear existing annotations.
Line 861-862 has inverted logic:
if crd.Annotations != nil { crd.Annotations = map[string]string{} }clears annotations when they exist, instead of initializing when nil. This would remove any existing annotations on the CRD.🐛 Proposed fix
if crd.Spec.Group == "hypershift.openshift.io" { if !opts.EnableConversionWebhook { return } - if crd.Annotations != nil { + if crd.Annotations == nil { crd.Annotations = map[string]string{} }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/install/install.go` around lines 856 - 879, The current check in the CRD conversion setup clears existing annotations because it does "if crd.Annotations != nil { crd.Annotations = map[string]string{} }"; change this to initialize annotations only when missing by using "if crd.Annotations == nil { crd.Annotations = map[string]string{} }" within the block that handles crd.Spec.Group == "hypershift.openshift.io" (the anonymous func handling *apiextensionsv1.CustomResourceDefinition), so existing annotations are preserved and only a nil map is replaced.
🧹 Nitpick comments (1)
hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go (1)
226-266: Consider settingcaSecret.Typeexplicitly for consistency.In
Reconcile(line 94),caSecret.Typeis set tocorev1.SecretTypeOpaque, but inGenerateWebhookCertsit's not set. While Opaque is the default, setting it explicitly would improve consistency between install-time and runtime behavior.♻️ Optional fix
caSecret := &corev1.Secret{ ObjectMeta: metav1.ObjectMeta{ Name: CASecretName, Namespace: namespace, }, + Type: corev1.SecretTypeOpaque, Data: map[string][]byte{}, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go` around lines 226 - 266, The caSecret created in GenerateWebhookCerts is missing an explicit Type assignment (caSecret.Type) while Reconcile sets caSecret.Type = corev1.SecretTypeOpaque; to make install-time generation consistent with runtime behavior, set caSecret.Type = corev1.SecretTypeOpaque on the caSecret constructed in GenerateWebhookCerts (the secret named by CASecretName) so the secret type matches what Reconcile expects.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go`:
- Around line 58-73: The goroutine in SetupWithManager uses an unbuffered
channel initialSync which can block if the controller isn't yet watching; change
initialSync to a buffered channel with capacity 1 (make(chan event.GenericEvent,
1)) and send the single GenericEvent for CASecretName into it, then close the
channel after sending (or rely on buffer) to ensure the goroutine never blocks;
update references in SetupWithManager and keep the existing
source.Channel(initialSync, ...) usage.
---
Outside diff comments:
In `@cmd/install/install.go`:
- Around line 856-879: The current check in the CRD conversion setup clears
existing annotations because it does "if crd.Annotations != nil {
crd.Annotations = map[string]string{} }"; change this to initialize annotations
only when missing by using "if crd.Annotations == nil { crd.Annotations =
map[string]string{} }" within the block that handles crd.Spec.Group ==
"hypershift.openshift.io" (the anonymous func handling
*apiextensionsv1.CustomResourceDefinition), so existing annotations are
preserved and only a nil map is replaced.
---
Nitpick comments:
In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go`:
- Around line 226-266: The caSecret created in GenerateWebhookCerts is missing
an explicit Type assignment (caSecret.Type) while Reconcile sets caSecret.Type =
corev1.SecretTypeOpaque; to make install-time generation consistent with runtime
behavior, set caSecret.Type = corev1.SecretTypeOpaque on the caSecret
constructed in GenerateWebhookCerts (the secret named by CASecretName) so the
secret type matches what Reconcile expects.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 1adb3310-410a-4485-9b8b-29070f674264
📒 Files selected for processing (5)
cmd/install/assets/hypershift_operator.gocmd/install/install.gocmd/install/install_test.gohypershift-operator/controllers/webhookcerts/webhookcerts_controller.gohypershift-operator/main.go
💤 Files with no reviewable changes (1)
- cmd/install/assets/hypershift_operator.go
933c722 to
0f40585
Compare
|
@enxebre: This pull request references CNTRLPLANE-1200 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
0f40585 to
768b924
Compare
|
@enxebre: This pull request references CNTRLPLANE-1200 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmd/install/install.go (1)
861-862:⚠️ Potential issue | 🔴 CriticalLogic error: Annotations cleared instead of initialized.
The condition
if crd.Annotations != nilfollowed bycrd.Annotations = map[string]string{}clears all existing annotations when they exist. This will wipe annotations likerelease.openshift.io/feature-setthat were checked earlier in the filtering logic (lines 819-831).If the intent was to ensure an empty annotation map exists, the condition should be inverted:
🐛 Proposed fix
- if crd.Annotations != nil { + if crd.Annotations == nil { crd.Annotations = map[string]string{} }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/install/install.go` around lines 861 - 862, The code is clearing existing CRD annotations by doing "if crd.Annotations != nil { crd.Annotations = map[string]string{} }"; change this to only initialize annotations when nil so existing keys (e.g. release.openshift.io/feature-set checked earlier) are preserved: replace the condition to check for nil and assign a new map only when crd.Annotations == nil, leaving crd.Annotations intact otherwise.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@cmd/install/install.go`:
- Around line 861-862: The code is clearing existing CRD annotations by doing
"if crd.Annotations != nil { crd.Annotations = map[string]string{} }"; change
this to only initialize annotations when nil so existing keys (e.g.
release.openshift.io/feature-set checked earlier) are preserved: replace the
condition to check for nil and assign a new map only when crd.Annotations ==
nil, leaving crd.Annotations intact otherwise.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: a021b9ac-57d2-4b01-9652-82d97adb5c79
📒 Files selected for processing (6)
cmd/install/assets/hypershift_operator.gocmd/install/install.gocmd/install/install_test.gohypershift-operator/controllers/webhookcerts/webhookcerts_controller.gohypershift-operator/controllers/webhookcerts/webhookcerts_controller_test.gohypershift-operator/main.go
💤 Files with no reviewable changes (1)
- cmd/install/assets/hypershift_operator.go
✅ Files skipped from review due to trivial changes (1)
- cmd/install/install_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- hypershift-operator/main.go
|
/test e2e-azure-self-managed |
|
|
||
| // GenerateWebhookCerts generates the CA and serving cert secrets for use at install time. | ||
| // It also returns the CA bundle bytes for injection into CRDs and webhook configs. | ||
| func GenerateWebhookCerts(namespace, serviceName string) (*corev1.Secret, *corev1.Secret, []byte, error) { |
There was a problem hiding this comment.
nit: just to make it more clear that this is only called once at start
| func GenerateWebhookCerts(namespace, serviceName string) (*corev1.Secret, *corev1.Secret, []byte, error) { | |
| func GenerateInitialWebhookCerts(namespace, serviceName string) (*corev1.Secret, *corev1.Secret, []byte, error) { |
|
@enxebre CNTRLPLANE-1200 is for the The issue tracking the CAPI bump itself is CNTRLPLANE-2207. We should link this issue there or create a subtask for this as blocker to CNTRLPLANE-2207. Otherwise LGTM! |
|
@enxebre: This pull request references CNTRLPLANE-1200 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go`:
- Around line 222-259: GenerateInitialWebhookCerts currently always builds empty
Secrets which forces new CA and leaf certs on every run; change it to be
idempotent by reusing existing Secrets when present instead of always recreating
them. Specifically, in GenerateInitialWebhookCerts check for existing Secrets
named CASecretName and ServingCertSecretName in the target namespace (or accept
them as optional parameters), and if found validate and reuse their Data (return
existing caSecret.Data[certs.CASignerCertMapKey] as caBundle); only call
certs.ReconcileSelfSignedCA to create a CA when the CA Secret is missing or
invalid, and only call certs.ReconcileSignedCert when the serving Secret is
missing/expired or the CA has changed — keep using the same symbols
(CASecretName, ServingCertSecretName, certs.ReconcileSelfSignedCA,
certs.ReconcileSignedCert) so the install flow in cmd/install/install.go will no
longer overwrite a healthy trust chain on re-run.
- Around line 61-66: The controller setup currently only watches Secret events
(in the ctrl.NewControllerManagedBy(mgr).For(&corev1.Secret{},
builder.WithPredicates(...)) chain) so if both CASecretName and
ServingCertSecretName are missing at startup Reconcile never runs; modify the
controller builder to ensure an initial bootstrap reconcile — either add
builder.WithInitialReconcile() to the For(...) call or add a stable watch (e.g.,
Watch(&corev1.Namespace{}) or a singleton ConfigMap) so the controller will be
enqueued at manager start and the existing Reconcile method can create missing
secrets.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 7f4a21bb-c5c6-471a-a3e1-c419e699179d
📒 Files selected for processing (2)
cmd/install/install.gohypershift-operator/controllers/webhookcerts/webhookcerts_controller.go
🚧 Files skipped from review as they are similar to previous changes (1)
- cmd/install/install.go
| return ctrl.NewControllerManagedBy(mgr). | ||
| For(&corev1.Secret{}, builder.WithPredicates(predicate.NewPredicateFuncs(func(o client.Object) bool { | ||
| return o.GetNamespace() == r.Namespace && | ||
| (o.GetName() == CASecretName || o.GetName() == ServingCertSecretName) | ||
| }))). | ||
| Complete(r) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, examine the full SetupWithManager method
cat -n hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go | head -100Repository: openshift/hypershift
Length of output: 4126
🏁 Script executed:
#!/bin/bash
# Get the full file to understand context
wc -l hypershift-operator/controllers/webhookcerts/webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 139
🏁 Script executed:
#!/bin/bash
# Look at the entire SetupWithManager and Reconcile methods
sed -n '50,80p' hypershift-operator/controllers/webhookcerts/webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 1079
🏁 Script executed:
#!/bin/bash
# Check the Reconcile method
sed -n '40,150p' hypershift-operator/controllers/webhookcerts/webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 4054
🏁 Script executed:
#!/antml:max_thinking>
I'll start by looking at the file structure to understand what we're dealing with and then examine the specific methods mentioned.
</thinking>
shell
# Check the size and overall structure
cd hypershift-operator/controllers/webhookcerts && wc -l webhookcerts_controller.go && head -30 webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 273
🏁 Script executed:
#!/bin/bash
# Check the full Reconcile method and see how createOrUpdate works
sed -n '69,120p' hypershift-operator/controllers/webhookcerts/webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 1650
🏁 Script executed:
#!/bin/bash
# Check if there's any initial reconcile or bootstrap logic
grep -n "Reconcile\|bootstrap\|initial\|enqueue" hypershift-operator/controllers/webhookcerts/webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 998
🏁 Script executed:
#!/bin/bash
# Look at GenerateInitialWebhookCerts which was mentioned in the original review
grep -n "GenerateInitialWebhookCerts" hypershift-operator/controllers/webhookcerts/webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 283
🏁 Script executed:
#!/bin/bash
# Find where this controller is instantiated/used
fd -t f "\.go$" -x grep -l "WebhookCertReconciler" {} \; | head -10Repository: openshift/hypershift
Length of output: 246
🏁 Script executed:
#!/bin/bash
# Check main.go to see how the controller is initialized
grep -B5 -A10 "WebhookCertReconciler" hypershift-operator/main.goRepository: openshift/hypershift
Length of output: 620
🏁 Script executed:
#!/bin/bash
# Check GenerateInitialWebhookCerts full implementation
sed -n '220,260p' hypershift-operator/controllers/webhookcerts/webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 1392
🏁 Script executed:
#!/bin/bash
# Check the install.go mentioned in original review to understand the flow
grep -n "GenerateInitialWebhookCerts\|WebhookCertReconciler" cmd/install/install.go | head -20Repository: openshift/hypershift
Length of output: 198
🏁 Script executed:
#!/bin/bash
# Check where GenerateInitialWebhookCerts secrets are applied in install.go
sed -n '730,750p' cmd/install/install.goRepository: openshift/hypershift
Length of output: 895
🏁 Script executed:
#!/bin/bash
# Look at upsert.CreateOrUpdate to understand its behavior
fd -t f "upsert" hypershift-operator/ cmd/install/ -x grep -l "CreateOrUpdateFN\|CreateOrUpdate" {} \; | head -5Repository: openshift/hypershift
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Check the implementation of createOrUpdate (upsert package)
find . -path "*/support/upsert*" -name "*.go" | head -5Repository: openshift/hypershift
Length of output: 212
🏁 Script executed:
#!/bin/bash
# Check the upsert implementation to understand CreateOrUpdate behavior
sed -n '1,100p' support/upsert/upsert.goRepository: openshift/hypershift
Length of output: 3674
🏁 Script executed:
#!/bin/bash
# Look at the CreateOrUpdateFN type definition
grep -n "type CreateOrUpdateFN\|CreateOrUpdate func" support/upsert/upsert.goRepository: openshift/hypershift
Length of output: 223
🏁 Script executed:
#!/antml:thinking>
Need to understand if createOrUpdate uses server-side apply (which would auto-create on first reconcile) or regular client operations. Also need to verify the bootstrap scenario.
</thinking>
shell
# Check if there's any initial enqueue or bootstrap mechanism in controller-runtime
grep -n "Owns\|Watches\|Initial\|Bootstrap" hypershift-operator/controllers/webhookcerts/webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 329
🏁 Script executed:
#!/bin/bash
# Check for any Owns/Watches calls that might enqueue initial reconcile
grep -n "\.Owns\|\.Watches" hypershift-operator/controllers/webhookcerts/webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 46
🏁 Script executed:
#!/bin/bash
# Check if there's any manual initial enqueue or bootstrap logic anywhere in SetupWithManager or nearby
sed -n '57,75p' hypershift-operator/controllers/webhookcerts/webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 787
🏁 Script executed:
#!/antml:thinking>
The user's shell results need to be checked. Let me verify the actual behavior of the bootstrap scenario by checking whether secrets might not exist at startup.
</thinking>
shell
# Check the test file to understand bootstrap scenarios
grep -n "TestSetupWithManager\|TestReconcile" hypershift-operator/controllers/webhookcerts/webhookcerts_controller_test.goRepository: openshift/hypershift
Length of output: 132
🏁 Script executed:
#!/bin/bash
# Check if there's any logic that handles missing secrets during startup
sed -n '69,135p' hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go | grep -A5 "Reconcile\|missing\|absent"Repository: openshift/hypershift
Length of output: 1064
🏁 Script executed:
#!/an/bin/bash
# Check the test to understand if this bootstrap issue is tested or acknowledged
head -100 hypershift-operator/controllers/webhookcerts/webhookcerts_controller_test.goRepository: openshift/hypershift
Length of output: 3636
🏁 Script executed:
#!/bin/bash
# Verify the flow: check if install-time secrets are guaranteed to be pre-created
# Look at how the manager is started and if any initialization hooks exist
grep -n "SetupWithManager\|RunWithOptions\|Start" hypershift-operator/main.go | head -20Repository: openshift/hypershift
Length of output: 1382
🏁 Script executed:
#!/bin/bash
# Check if there's any manual trigger or enqueue mechanism in the main.go around SetupWithManager
sed -n '525,535p' hypershift-operator/main.goRepository: openshift/hypershift
Length of output: 452
🏁 Script executed:
#!/bin/bash
# Check if there's any other source watching Namespaces or other stable resources
grep -B5 -A15 "webhookCertReconciler.SetupWithManager" hypershift-operator/main.goRepository: openshift/hypershift
Length of output: 1118
🏁 Script executed:
#!/bin/bash
# Check if the secrets are guaranteed to exist by verifying the install flow more carefully
# Look for where the generated secrets are actually applied
sed -n '730,760p' cmd/install/install.goRepository: openshift/hypershift
Length of output: 1320
🏁 Script executed:
#!/bin/bash
# Verify the order: are secrets applied before the manager starts?
grep -n "GenerateInitialWebhookCerts\|mgr.Start\|manager.Start" cmd/install/install.go hypershift-operator/main.go | head -15Repository: openshift/hypershift
Length of output: 276
🏁 Script executed:
#!/antml:thinking>
I need to understand the order of operations:
1. Are the secrets generated and applied (created in the cluster) before the manager starts?
2. Or does the manager start first, and then the secrets are applied?
If the manager starts before secrets are applied, then the bootstrap issue is real. If secrets are applied first, then the controller would just be missing an optimization/resilience pattern.
</thinking>
shell
# Check what objects are actually returned from install and how they're used
grep -n "return nil, nil\|return.*objects" cmd/install/install.go | head -10Repository: openshift/hypershift
Length of output: 283
🏁 Script executed:
#!/bin/bash
# Check the broader context around install and deployment
# to understand if secrets are guaranteed before manager starts
grep -n "Install\|NewManager\|Start" cmd/install/install.go | head -20Repository: openshift/hypershift
Length of output: 1521
🏁 Script executed:
#!/bin/bash
# Check if controller-runtime has any bootstrap/initial reconcile mechanisms
# that might be automatically triggered
grep -n "Initial\|Bootstrap\|Enqueue" hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go hypershift-operator/main.goRepository: openshift/hypershift
Length of output: 427
🏁 Script executed:
#!/antml:thinking>
Let me think about what I've observed:
1. SetupWithManager only registers a `.For(&corev1.Secret{})` watch with a predicate
2. No `.Owns()`, no `.Watches()`, no manual initial enqueue
3. The Reconcile method CAN create the secrets if they're missing (via createOrUpdate)
4. But Reconcile will only run if a Secret event is triggered
5. The secrets are generated in install.go and appended to objects
6. The manager starts after install objects are supposed to be applied
The critical issue is:
- At manager startup, if both watched Secrets are already absent (or never created), there's NO initial request queued
- Without an event on either Secret, Reconcile never runs
- The controller cannot recover from an empty state
This is a REAL issue because:
1. Secrets could be deleted after initial install
2. Namespace could be recreated
3. Any scenario where secrets are missing at startup
The review is CORRECT. The controller needs an initial bootstrap enqueue or a watch on a stable resource (like the Namespace).
Let me confirm the fix would be simple and this is indeed the intended issue.
</thinking>
shell
# Check if there's a pattern in other controllers for initial enqueue
fd -t f "SetupWithManager" hypershift-operator/controllers/ -x grep -l "InitialReconcile\|EnqueueRequest" {} \;Repository: openshift/hypershift
Length of output: 237
🏁 Script executed:
#!/bin/bash
# Look for any controller that does watch a Namespace or similar stable resource
grep -n "\.Watches" hypershift-operator/controllers/*/\*.go | head -5Repository: openshift/hypershift
Length of output: 135
🏁 Script executed:
#!/antml:thinking>
Based on all the evidence I've gathered, the original review is correct. The issue is:
1. SetupWithManager only registers `.For(&corev1.Secret{})` with a specific predicate
2. There's NO initial enqueue or watch on a stable resource
3. The Reconcile method CAN handle missing secrets (creates them via createOrUpdate)
4. But Reconcile will NEVER run if both Secrets don't exist at startup
The fix would be to add one of:
- A manual initial reconcile request in SetupWithManager
- A watch on the Namespace (stable singleton)
- A watch on another stable resource that triggers reconcile
This is a valid major reliability issue. Let me finalize my verification.
</thinking>
shell
# Final verification: check if controller-runtime's manager provides any automatic initial enqueue
grep -n "Reconcile\|enqueue" hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go | head -15Repository: openshift/hypershift
Length of output: 279
🏁 Script executed:
#!/bin/bash
# Final check: verify that controller-runtime doesn't auto-enqueue on SetupWithManager completion
# by checking the builder pattern usage
sed -n '61,66p' hypershift-operator/controllers/webhookcerts/webhookcerts_controller.goRepository: openshift/hypershift
Length of output: 350
🏁 Script executed:
#!/antml:thinking>
I have enough evidence now. Let me summarize what I've verified:
1. SetupWithManager only uses `.For(&corev1.Secret{})` with a predicate
2. No `.Owns()`, `.Watches()`, or manual enqueue
3. Reconcile CAN create secrets if missing (uses createOrUpdate which will Create if not found)
4. But Reconcile only runs on Secret events
5. If both secrets are absent at startup, no event is triggered
6. Therefore, no reconciliation occurs and secrets can't be created
7. This is a bootstrap/startup reliability issue
The original review is CORRECT. This is a major issue that needs to be addressed.
</thinking>
shell
# Verify the complete absence of initial enqueue or bootstrap pattern
grep -n "EnqueueReconciler\|EnqueueRequest\|Initial" hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go hypershift-operator/main.goRepository: openshift/hypershift
Length of output: 295
Controller lacks bootstrap reconciliation for webhook cert Secrets.
This controller only enqueues on Secret events. If both watched Secrets are absent when the manager starts, no reconciliation is triggered and the certs cannot be created. The Reconcile method is capable of creating missing secrets from scratch, but has no way to be initially invoked. Add an explicit initial enqueue (e.g., via builder.WithInitialReconcile) or watch a stable singleton resource (like the Namespace) to ensure startup can self-heal missing cert material.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go`
around lines 61 - 66, The controller setup currently only watches Secret events
(in the ctrl.NewControllerManagedBy(mgr).For(&corev1.Secret{},
builder.WithPredicates(...)) chain) so if both CASecretName and
ServingCertSecretName are missing at startup Reconcile never runs; modify the
controller builder to ensure an initial bootstrap reconcile — either add
builder.WithInitialReconcile() to the For(...) call or add a stable watch (e.g.,
Watch(&corev1.Namespace{}) or a singleton ConfigMap) so the controller will be
enqueued at manager start and the existing Reconcile method can create missing
secrets.
| func GenerateInitialWebhookCerts(namespace, serviceName string) (*corev1.Secret, *corev1.Secret, []byte, error) { | ||
| caSecret := &corev1.Secret{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: CASecretName, | ||
| Namespace: namespace, | ||
| }, | ||
| Data: map[string][]byte{}, | ||
| } | ||
| if err := certs.ReconcileSelfSignedCA(caSecret, "hypershift-webhook-ca", "openshift"); err != nil { | ||
| return nil, nil, nil, fmt.Errorf("failed to generate webhook CA: %w", err) | ||
| } | ||
|
|
||
| dnsNames := webhookDNSNames(serviceName, namespace) | ||
| servingSecret := &corev1.Secret{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: ServingCertSecretName, | ||
| Namespace: namespace, | ||
| }, | ||
| Type: corev1.SecretTypeTLS, | ||
| Data: map[string][]byte{}, | ||
| } | ||
| if err := certs.ReconcileSignedCert( | ||
| servingSecret, | ||
| caSecret, | ||
| "hypershift-operator", | ||
| []string{"openshift"}, | ||
| []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth}, | ||
| corev1.TLSCertKey, | ||
| corev1.TLSPrivateKeyKey, | ||
| "", | ||
| dnsNames, | ||
| nil, | ||
| ); err != nil { | ||
| return nil, nil, nil, fmt.Errorf("failed to generate webhook serving cert: %w", err) | ||
| } | ||
|
|
||
| caBundle := caSecret.Data[certs.CASignerCertMapKey] | ||
| return caSecret, servingSecret, caBundle, nil |
There was a problem hiding this comment.
GenerateInitialWebhookCerts makes the install path non-idempotent.
This helper always starts from empty Secrets, so every call generates a new CA and leaf cert. cmd/install/install.go Lines 736-743 then force-apply those returned Secrets, which means re-running the install path overwrites a healthy trust chain even when nothing is expiring. The install flow should reuse existing Secrets or only create them on first bootstrap.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go`
around lines 222 - 259, GenerateInitialWebhookCerts currently always builds
empty Secrets which forces new CA and leaf certs on every run; change it to be
idempotent by reusing existing Secrets when present instead of always recreating
them. Specifically, in GenerateInitialWebhookCerts check for existing Secrets
named CASecretName and ServingCertSecretName in the target namespace (or accept
them as optional parameters), and if found validate and reuse their Data (return
existing caSecret.Data[certs.CASignerCertMapKey] as caBundle); only call
certs.ReconcileSelfSignedCA to create a CA when the CA Secret is missing or
invalid, and only call certs.ReconcileSignedCert when the serving Secret is
missing/expired or the CA has changed — keep using the same symbols
(CASecretName, ServingCertSecretName, certs.ReconcileSelfSignedCA,
certs.ReconcileSignedCert) so the install flow in cmd/install/install.go will no
longer overwrite a healthy trust chain on re-run.
34c6f48 to
33ee3cc
Compare
|
@enxebre: This pull request references CNTRLPLANE-1200 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
♻️ Duplicate comments (2)
hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go (2)
222-259:⚠️ Potential issue | 🟠 MajorMake initial cert generation idempotent for repeated install executions.
This helper always mints a new CA and serving cert from empty Secret objects. Re-running install rotates trust material even when existing certs are still valid, which can cause avoidable webhook trust churn. Reuse existing valid Secrets and only regenerate when missing/invalid.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go` around lines 222 - 259, GenerateInitialWebhookCerts currently always creates new empty Secrets (CASecretName and ServingCertSecretName) and calls certs.ReconcileSelfSignedCA / certs.ReconcileSignedCert which forces rotation; change it to first attempt to load existing Secrets and validate them (e.g., presence and validity of cert PEMs, CA signer cert key in caSecret.Data[certs.CASignerCertMapKey], and that serving cert is signed by the CA and not expired) and only call certs.ReconcileSelfSignedCA or certs.ReconcileSignedCert when the Secret is missing or the validation fails; keep the same return values (caSecret, servingSecret, caBundle) and reuse existing Secret objects when valid to make GenerateInitialWebhookCerts idempotent across repeated installs.
61-66:⚠️ Potential issue | 🟠 MajorEnsure a bootstrap reconcile path exists when Secrets are absent.
This setup only enqueues from Secret events. If both watched Secrets are missing at manager start, no event is emitted and reconciliation never runs, so cert material cannot self-heal. Add an explicit initial enqueue (or a stable secondary watch) to guarantee first reconcile.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go` around lines 61 - 66, The controller currently only watches Secret events (NewControllerManagedBy(...).For(&corev1.Secret{}...)) so if both CASecretName and ServingCertSecretName are absent at startup no reconcile is triggered; update the controller builder to add a stable secondary watch or an explicit initial enqueue to guarantee a first reconcile—for example, add a Watch for the operator Namespace (or a stable ConfigMap) using builder.Watches(...) with a predicate that filters on r.Namespace (or a Mapper that always enqueues the reconcile request for this reconciler), or add logic during Setup to enqueue the reconcile request once at manager start; change the controller setup around NewControllerManagedBy, For, and Complete(r) to include that extra Watch/initial enqueue so reconciliation runs even when the secrets are missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@hypershift-operator/controllers/webhookcerts/webhookcerts_controller.go`:
- Around line 222-259: GenerateInitialWebhookCerts currently always creates new
empty Secrets (CASecretName and ServingCertSecretName) and calls
certs.ReconcileSelfSignedCA / certs.ReconcileSignedCert which forces rotation;
change it to first attempt to load existing Secrets and validate them (e.g.,
presence and validity of cert PEMs, CA signer cert key in
caSecret.Data[certs.CASignerCertMapKey], and that serving cert is signed by the
CA and not expired) and only call certs.ReconcileSelfSignedCA or
certs.ReconcileSignedCert when the Secret is missing or the validation fails;
keep the same return values (caSecret, servingSecret, caBundle) and reuse
existing Secret objects when valid to make GenerateInitialWebhookCerts
idempotent across repeated installs.
- Around line 61-66: The controller currently only watches Secret events
(NewControllerManagedBy(...).For(&corev1.Secret{}...)) so if both CASecretName
and ServingCertSecretName are absent at startup no reconcile is triggered;
update the controller builder to add a stable secondary watch or an explicit
initial enqueue to guarantee a first reconcile—for example, add a Watch for the
operator Namespace (or a stable ConfigMap) using builder.Watches(...) with a
predicate that filters on r.Namespace (or a Mapper that always enqueues the
reconcile request for this reconciler), or add logic during Setup to enqueue the
reconcile request once at manager start; change the controller setup around
NewControllerManagedBy, For, and Complete(r) to include that extra Watch/initial
enqueue so reconciliation runs even when the secrets are missing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: aebe52de-8fc0-41dc-a05a-b28cb252f7f0
📒 Files selected for processing (3)
cmd/install/install.gohypershift-operator/controllers/webhookcerts/webhookcerts_controller.gohypershift-operator/controllers/webhookcerts/webhookcerts_controller_test.go
✅ Files skipped from review due to trivial changes (1)
- hypershift-operator/controllers/webhookcerts/webhookcerts_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- cmd/install/install.go
| log := ctrl.LoggerFrom(ctx) | ||
|
|
||
| // Only process our own secrets. | ||
| if req.Namespace != r.Namespace { |
There was a problem hiding this comment.
Are these 2 conditions not already covered in the predicate func in the setup? Only requests matching the NS and secret name should be reconciled already IIUC.
| return fmt.Errorf("failed to list CRDs: %w", err) | ||
| } | ||
|
|
||
| for i := range crdList.Items { |
There was a problem hiding this comment.
I have 2 questions about this logic:
- Since we are now patching the CRDs, we no longer need the
service.beta.openshift.io/inject-cabundleannotation (e.g. here) right? - Is this only needed by hypershift CRDs? IIUC other CRDs with conversion enabled would also need this.
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
/pipeline required |
|
Scheduling tests matching the |
|
/verified by e2e |
|
@LiangquanLi930: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified cancel |
|
@LiangquanLi930: The DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified remove |
|
@LiangquanLi930: The DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/verified later @enxebre |
|
@enxebre: This PR has been marked to be verified later by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-kubevirt-aws-ovn-reduced |
|
@enxebre: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/hold Revision 260c92e was retested 3 times: holding |
|
superseded by #8174 |
|
/close An alternative PR merged #8174 |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@clebs: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The service-ca operator is not available on non-OpenShift clusters (e.g. AKS). Instead of relying on its annotations to generate serving certs and inject CA bundles, always self-manage them for the HO SVC/Webhook:
The service-ca annotations (inject-cabundle on CRDs/webhook configs, serving-cert-secret-name on the Service) are removed as they are no longer needed.
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes
Special notes for your reviewer:
Checklist:
Summary by CodeRabbit
New Features
Improvements
Tests