Skip to content

fix(test): Fix flaky VPC e2e tests for SG rule deletion#331

Open
sapphirew wants to merge 1 commit into
aws-controllers-k8s:mainfrom
sapphirew:fix/vpc-e2e-test-flakiness
Open

fix(test): Fix flaky VPC e2e tests for SG rule deletion#331
sapphirew wants to merge 1 commit into
aws-controllers-k8s:mainfrom
sapphirew:fix/vpc-e2e-test-flakiness

Conversation

@sapphirew
Copy link
Copy Markdown
Contributor

@sapphirew sapphirew commented May 11, 2026

Issue #, if available:

Description of changes:

The test_crud_tags test failed because the ACK runtime's EnsureTags persist ACK system tags (services.k8s.aws/controller-version, services.k8s.aws/namespace) into spec.tags during reconciliation. Filter out tags with the services.k8s.aws/ prefix before asserting user tag counts and values.

The test_disallow_default_security_group_rule test failed because the controller was too busy processing other resources to complete the reconcile within the fixed 10-second sleep. Replace time.sleep with wait_on_condition("ACK.ResourceSynced", "True") which polls until the controller signals reconciliation is complete.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ack-prow
Copy link
Copy Markdown

ack-prow Bot commented May 11, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sapphirew
Once this PR has been reviewed and has the lgtm label, please assign knottnt for approval by writing /assign @knottnt in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment thread test/e2e/tests/test_vpc.py Outdated
assert len(resource["spec"]["tags"]) == 1
assert resource["spec"]["tags"][0]["key"] == "updatedtagkey"
assert resource["spec"]["tags"][0]["value"] == "updatedtagvalue"
user_spec_tags = [t for t in resource["spec"]["tags"] if not t["key"].startswith("services.k8s.aws/")]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update this to use shared util functions for asserting without ack tags? Heres an example from emr-serverless

https://github.com/aws-controllers-k8s/emrserverless-controller/blob/main/test/e2e/tests/test_application.py#L193

        updated_tags = {"environment": "production", "team": "data-platform"}
        cr = k8s.get_resource(ref)
        cr["spec"]["tags"] = updated_tags
        k8s.replace_custom_resource(ref, cr)
        time.sleep(MODIFY_WAIT_AFTER_SECONDS)
        
        assert k8s.wait_on_condition(ref, "ACK.ResourceSynced", "True", wait_periods=10)
        
        response = emrserverless_client.list_tags_for_resource(resourceArn=application_arn)
        latest_tags = response["tags"]
        
        tags.assert_ack_system_tags(tags=latest_tags)
        tags.assert_equal_without_ack_tags(expected=updated_tags, actual=latest_tags)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, updated to use the shared lib for tags assertion

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the tests already make tags assertion above. Going to drop this change until have a clear root cause.

Comment thread test/e2e/tests/test_vpc.py Outdated
assert resource["spec"]["tags"][0]["key"] == "initialtagkey"
assert resource["spec"]["tags"][0]["value"] == "initialtagvalue"
# Only user tags should be present in Spec (filter out ACK system tags)
user_spec_tags = [t for t in resource["spec"]["tags"] if not t["key"].startswith("services.k8s.aws/")]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, use shared utility functions for asserting without ack tags

Comment thread test/e2e/tests/test_vpc.py Outdated
)

# Check user tags are removed from Spec
# Check user tags are removed from Spec (filter out ACK system tags)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here, use shared utility functions for asserting without ack tags

@knottnt
Copy link
Copy Markdown
Contributor

knottnt commented May 11, 2026

@sapphirew I think this is actually a bug in the ec2-controller. The system tags should be filtered before patching the spec.tags field.

@sapphirew sapphirew force-pushed the fix/vpc-e2e-test-flakiness branch from 06122f9 to 431767b Compare May 11, 2026 23:28
…oup_rule

Replace time.sleep(10) with wait_on_condition("ACK.ResourceSynced",
"True") so the test waits for the controller to signal reconciliation
is complete rather than relying on a fixed delay that may be
insufficient under CI load.
@sapphirew sapphirew force-pushed the fix/vpc-e2e-test-flakiness branch from 431767b to daf90bb Compare May 12, 2026 01:15
@sapphirew sapphirew changed the title fix(test): Fix flaky VPC e2e tests for tags and SG rule deletion fix(test): Fix flaky VPC e2e tests for SG rule deletion May 12, 2026
@sapphirew
Copy link
Copy Markdown
Contributor Author

sapphirew commented May 12, 2026

@sapphirew I think this is actually a bug in the ec2-controller. The system tags should be filtered before patching the spec.tags field.

@knottnt I couldn't root cause the bug for tags in the ec2-controller yet. Going to reduce the scope of the fix to test_disallow_default_security_group_rule only and follow up on that

@sapphirew
Copy link
Copy Markdown
Contributor Author

/retest

@michaelhtm
Copy link
Copy Markdown
Member

@sapphirew
System tags are not supposed to be patched to the resource. I'll continue investigating this bug and address it
/hold

@ack-prow ack-prow Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 12, 2026
@michaelhtm
Copy link
Copy Markdown
Member

This PR should fix system tags issues: aws-controllers-k8s/code-generator#702

Comment on lines +252 to +253
# Wait for the controller to finish reconciling (SG rule deletion
# happens inline during creation but the controller may be busy)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think this is true anymore after you previous PR.

Suggested change
# Wait for the controller to finish reconciling (SG rule deletion
# happens inline during creation but the controller may be busy)
# Wait for the controller to finish reconciling (SG rule deletion
# happens in update path after initial creation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants