OCPBUGS-79539: fix(cpo-v2): preserve HCCO modifications to OCM Controllers field#8072
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@sjenning: This pull request references Jira Issue OCPBUGS-79539, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe pull request refactors image registry reconciliation in the control-plane operator by inlining the Sequence DiagramSequence diagram generation is not applicable for these changes as they primarily consist of refactoring and configuration adjustments without introducing significant new feature interactions that would benefit from visualization. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@sjenning: This pull request references Jira Issue OCPBUGS-79539, which is valid. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
control-plane-operator/controllers/hostedcontrolplane/v2/ocm/config_test.go (1)
175-177: Strengthen precedence assertion in disabled-capability testLine 176 passes
nilforexistingControllers, so this test doesn’t prove that disabled capability overrides a conflicting existing value. Pass a non-empty conflicting slice and keep the same expected result to lock precedence behavior.Test hardening diff
- adaptConfig(config, hcp.Spec.Configuration, imageProvider, &v1.Build{}, hcp.Spec.Capabilities, []string{}, nil) + adaptConfig( + config, + hcp.Spec.Configuration, + imageProvider, + &v1.Build{}, + hcp.Spec.Capabilities, + []string{}, + []string{"*"}, // conflicting existing value; disabled capability should still force pull-secret controller off + )Also applies to: 179-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/controllers/hostedcontrolplane/v2/ocm/config_test.go` around lines 175 - 177, The test currently calls adaptConfig(config, hcp.Spec.Configuration, imageProvider, &v1.Build{}, hcp.Spec.Capabilities, []string{}, nil) passing nil for existingControllers so it doesn't verify that a disabled capability beats a conflicting existing controller; update the call(s) to pass a non-empty slice containing a conflicting controller name (e.g., []string{"image-registry"}) instead of nil and keep the same expected assertions so the test verifies that adaptConfig still disables the controller when the ImageRegistry capability is explicitly disabled; apply the same change to the other occurrences around lines 179-184 that call adaptConfig with nil existingControllers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@control-plane-operator/controllers/hostedcontrolplane/v2/ocm/config.go`:
- Around line 42-50: The current code silently ignores errors from
cpContext.Client.Get and util.DeserializeResource which can cause accidental
overwrite of controller settings; update the logic in the block that reads
existingCM/configKey to explicitly handle failures: if cpContext.Client.Get(...)
returns an error, propagate/return that error (or requeue) instead of treating
it as a no-op, and if util.DeserializeResource(...) fails, log and return the
deserialize error (or requeue) instead of continuing; make these checks around
cpContext.Client.Get, existingCM, util.DeserializeResource, existingConfig and
only set existingControllers when deserialization succeeds.
---
Nitpick comments:
In `@control-plane-operator/controllers/hostedcontrolplane/v2/ocm/config_test.go`:
- Around line 175-177: The test currently calls adaptConfig(config,
hcp.Spec.Configuration, imageProvider, &v1.Build{}, hcp.Spec.Capabilities,
[]string{}, nil) passing nil for existingControllers so it doesn't verify that a
disabled capability beats a conflicting existing controller; update the call(s)
to pass a non-empty slice containing a conflicting controller name (e.g.,
[]string{"image-registry"}) instead of nil and keep the same expected assertions
so the test verifies that adaptConfig still disables the controller when the
ImageRegistry capability is explicitly disabled; apply the same change to the
other occurrences around lines 179-184 that call adaptConfig with nil
existingControllers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 5ab04ec1-510f-4bf1-866c-2adfb215fe93
📒 Files selected for processing (2)
control-plane-operator/controllers/hostedcontrolplane/v2/ocm/config.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/ocm/config_test.go
b09a3ce to
8aff732
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go (1)
454-462: Consider simplifying the error handling.The closure can be simplified since the error is just returned directly.
♻️ Simplified closure
if _, err := r.CreateOrUpdate(ctx, r.client, registryConfig, func() error { - err = registry.ReconcileRegistryConfig(registryConfig, r.platformType, hcp.Spec.InfrastructureAvailabilityPolicy) - if err != nil { - return err - } - return nil + return registry.ReconcileRegistryConfig(registryConfig, r.platformType, hcp.Spec.InfrastructureAvailabilityPolicy) }); err != nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go` around lines 454 - 462, The closure passed to r.CreateOrUpdate is verbose because it assigns err then checks and returns it; simplify by returning the result of registry.ReconcileRegistryConfig directly. Update the anonymous function passed to r.CreateOrUpdate so it simply calls and returns registry.ReconcileRegistryConfig(registryConfig, r.platformType, hcp.Spec.InfrastructureAvailabilityPolicy) instead of setting err and conditionally returning it, keeping the surrounding CreateOrUpdate call and error wrap logic unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In
`@control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go`:
- Around line 454-462: The closure passed to r.CreateOrUpdate is verbose because
it assigns err then checks and returns it; simplify by returning the result of
registry.ReconcileRegistryConfig directly. Update the anonymous function passed
to r.CreateOrUpdate so it simply calls and returns
registry.ReconcileRegistryConfig(registryConfig, r.platformType,
hcp.Spec.InfrastructureAvailabilityPolicy) instead of setting err and
conditionally returning it, keeping the surrounding CreateOrUpdate call and
error wrap logic unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: b509cb8f-6d67-4ec6-9494-c9e139affa9b
📒 Files selected for processing (4)
control-plane-operator/controllers/hostedcontrolplane/v2/ocm/config.gocontrol-plane-operator/controllers/hostedcontrolplane/v2/ocm/config_test.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.gocontrol-plane-operator/hostedclusterconfigoperator/controllers/resources/resources_test.go
💤 Files with no reviewable changes (1)
- control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- control-plane-operator/controllers/hostedcontrolplane/v2/ocm/config.go
- control-plane-operator/controllers/hostedcontrolplane/v2/ocm/config_test.go
| } | ||
| } else { | ||
| registryConfigExists = true | ||
| } |
There was a problem hiding this comment.
Registry config fetch moved outside capability guard
Medium Severity
The r.client.Get call for the registry config was moved outside the IsImageRegistryCapabilityEnabled check. Previously, reconcileImageRegistry was only called when the capability was enabled, so the Get never executed when the capability was disabled. Now it runs unconditionally. If the ImageRegistry capability is disabled and the imageregistry.operator.openshift.io/v1 CRD is absent from the guest cluster, the Get returns a non-NotFound error, triggering a hard return ctrl.Result{}, err that aborts the entire reconciliation — blocking all subsequent steps like ingress, additional trusted CAs, etc. The registryConfig and registryConfigExists variables are only consumed inside the capability-enabled block, so the fetch can be moved back inside that guard.
e6828dd to
16c28d4
Compare
| } | ||
|
|
||
| // TODO: remove this when ROSA HCP stops setting the managementState to Removed to disable the Image Registry | ||
| if registryConfig.Spec.ManagementState == operatorv1.Removed && r.platformType != hyperv1.IBMCloudPlatform && r.platformType != hyperv1.AzurePlatform { |
There was a problem hiding this comment.
Add unit tests for managementState == Removed OCM ConfigMap logic
There was a problem hiding this comment.
Changes to resources.go and resources_test.go are just the revert
| @@ -2964,127 +2963,3 @@ func Test_reconciler_reconcileKASConnectionCheckerDeployment(t *testing.T) { | |||
| }) | |||
| } | |||
| } | |||
|
|
|||
| func TestReconcileImageRegistry(t *testing.T) { | |||
There was a problem hiding this comment.
Should we restore or replace deleted TestReconcileImageRegistry test coverage (4 platform-specific scenarios lost)
There was a problem hiding this comment.
The revert was not clean unfortunately. Are you saying that something not related to the revert was removed?
| // (e.g., Controllers field when registry is disabled via managementState: Removed) | ||
| var existingControllers []string | ||
| existingCM := &corev1.ConfigMap{} | ||
| err = cpContext.Client.Get(cpContext.Context, client.ObjectKeyFromObject(cm), existingCM) |
There was a problem hiding this comment.
Should this use component.ReconcileExisting() to leverage cpov2 framework?
There was a problem hiding this comment.
I'd rather we don't and keep it explicit to the controllers field
| if !capabilities.IsImageRegistryCapabilityEnabled(caps) { | ||
| cfg.Controllers = []string{"*", fmt.Sprintf("-%s", openshiftcpv1.OpenShiftServiceAccountPullSecretsController)} | ||
| cfg.DockerPullSecret.InternalRegistryHostname = "" | ||
| } else if len(existingControllers) > 0 { |
There was a problem hiding this comment.
The same cursor stale controller comment was picked up by the HyperShift SME agents as well. Is this not an issue?
There was a problem hiding this comment.
It is not, we don't allow re-enabling of the image registry via this mechanism
| } | ||
|
|
||
| // TODO: remove this when ROSA HCP stops setting the managementState to Removed to disable the Image Registry | ||
| if registryConfig.Spec.ManagementState == operatorv1.Removed && r.platformType != hyperv1.IBMCloudPlatform && r.platformType != hyperv1.AzurePlatform { |
There was a problem hiding this comment.
Should we add a comment why IBM and Azure are being excluded from the managementState check?
There was a problem hiding this comment.
Honestly, this should have been "only AWS" not "not IBM and not Azure". It is part of the revert as well though.
16c28d4 to
6c53e09
Compare
| WithManifestAdapter( | ||
| "config.yaml", | ||
| component.WithAdaptFunction(adaptConfigMap), | ||
| component.ReconcileExisting(), |
There was a problem hiding this comment.
we shouldn't be doing this. The static configMap manifest has some fields already that are loaded and used https://github.com/openshift/hypershift/blob/main/control-plane-operator/controllers/hostedcontrolplane/v2/assets/openshift-controller-manager/config.yaml
if we do this, any changes to manifest in the future will not apply to existing clusters
There was a problem hiding this comment.
I restored it to the older way within this PR doing a Get and only merging the controllers
…mentState" This reverts commit 66776b9.
|
Reviewer note: any changes under |
|
/test security |
CPO v2 was overwriting the entire OCM config on each reconciliation, erasing HCCO's modification to disable the pull secrets controller when the image registry is disabled via managementState: Removed instead of the ImageRegistry capability. This fix: - Fetches the actual ConfigMap from the cluster (not just the static asset) - Preserves the existing Controllers field when ImageRegistry capability is enabled but HCCO has modified it (e.g., via managementState: Removed) - Only overrides Controllers when ImageRegistry capability is explicitly disabled This ensures that registry pull secrets are not created when the registry is disabled via managementState: Removed on non-Azure/non-IBM platforms. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #8072 +/- ##
==========================================
- Coverage 29.96% 29.94% -0.02%
==========================================
Files 1049 1049
Lines 97523 97557 +34
==========================================
- Hits 29225 29216 -9
- Misses 65796 65837 +41
- Partials 2502 2504 +2
🚀 New features to boost your workflow:
|
|
Scheduling tests matching the |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
code coverage reduction is due to the revert /override codecov/patch |
|
@sjenning: /override requires failed status contexts, check run or a prowjob name to operate on.
Only the following failed contexts/checkruns were expected:
If you are trying to override a checkrun that has a space in it, you must put a double quote on the context. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/override codecov/project |
|
@bryan-cox: Overrode contexts on behalf of bryan-cox: codecov/project DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test e2e-aks |
|
/test e2e-azure-self-managed |
|
/verified by @sjenning |
|
@sjenning: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/override codecov/patch |
|
@sjenning: Overrode contexts on behalf of sjenning: codecov/patch DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@sjenning: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
e087807
into
openshift:main
|
@sjenning: Jira Issue Verification Checks: Jira Issue OCPBUGS-79539 Jira Issue OCPBUGS-79539 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Fix included in accepted release 4.22.0-0.nightly-2026-04-03-204456 |


CPO v2 was overwriting the entire OCM config on each reconciliation, erasing HCCO's modification to disable the pull secrets controller when the image registry is disabled via managementState: Removed instead of the ImageRegistry capability.
This fix:
This was noted in CPOv1
reconcileConfig()for OCM at one point but the comment was lost herehttps://github.com/openshift/hypershift/pull/5456/changes#diff-35cfcb87bebf36a8326be8962fb764438dbba55332d0776dae54d94dbfd614d2L60
🔴 #7634 removed the HCCO but we still need it as it is in-use by users. The PR reverts this change.
Note
Medium Risk
Touches control-plane reconciliation of the image registry and
openshift-controller-managerconfig, so mis-merges could unexpectedly enable/disable controllers or change registry behavior, but the change is small and covered by targeted unit tests.Overview
Ensures
control-plane-operatorpreserves in-cluster modifications to theopenshift-controller-managerconfig by reading the existingConfigMapfrom the cluster and reusing itsControllersfield when the Image Registry capability is not explicitly disabled.In the hosted cluster config operator reconciliation loop, inlines image-registry config reconciliation and adds a temporary path: when
imageregistryis set tomanagementState: Removed(on non-IBMCloud/non-Azure), it updates the control-planeopenshift-controller-managerconfig to disable the pull-secrets controller.Updates/extends unit tests to validate controller preservation vs explicit Image Registry capability disable, and removes the now-obsolete
reconcileImageRegistryunit test.Written by Cursor Bugbot for commit 46d5bff. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
Bug Fixes
Refactor
Tests