Skip to content

Conversation

@alexliu541
Copy link
Contributor

Description

When TrafficPolicy or HTTPListenerPolicy become orphan, ie all its TargetRefs are all invalid, it will not be picked up by translation phase due to its reverse lookup mechanism and thus will not receive any status update. This means any previous status message becomes stale and still remains on the policy's status. This PR resolves this problem by implementing the status collection approach discussed in the design doc linked below. Now the policy's status will be correctly cleared if the policy ever become orphaned.

Design doc: EP-11747

Fixes #11402

Changes

  • Add status collection for TrafficPolicy and HTTPListenerPolicy plugin, which will generate a marker if the CR has parent referring to kGateway resource in existing status
  • Add callback function in plugin fields
  • Fixes a problem where ancestor ref in status cannot be nil
  • Proxy syncer will process the markers and merge with translation report map
  • Add E2E test for orphaned policies status clear

Change Type

/kind fix

Changelog

Clear stale TrafficPolicy and HTTPListenerPolicy status after the policy has all invalid TargetRefs

@gateway-bot gateway-bot added kind/fix Categorizes issue or PR as related to a bug. release-note labels Nov 13, 2025
}

// Create status marker if existing status has kgateway controller
var statusMarker *struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's define this as a concrete type StatusMarker that can be reused across different files

Copy link
Contributor Author

@alexliu541 alexliu541 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined in #12852, will update this once the other merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

GlobalPolicies func(krt.HandlerContext) ir.PolicyIR
Policies krt.Collection[ir.PolicyWrapper]
// ProcessPolicyStatusMarkers add empty reports for policies to clear stale status
ProcessPolicyStatusMarkers func(krt.HandlerContext, *reports.ReportMap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name could be changed to something more expressive, such as AddStaleStatusMarker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not to add the marker but the call back to process the markers. I think its important to specify its for policy in the name, so do you think changing this to ProcessPolicyStaleStatusMarkers is a bit too long?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On reading this further, it appears that the function simply creates an empty report for policies with status referencing our controller and relies on BuildPolicyStatus to remove the stale status. Is that correct? If not, where does the clearing of stale status happen?

ProcessPolicyStaleStatusMarker seems reasonable

Copy link
Contributor Author

@alexliu541 alexliu541 Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is correct, empty policy report entry in reportmap means empty status, so status syncer essentially clears the statuses.
Will change to ProcessPolicyStaleStatusMarker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to add a unit test for BuildPolicyStatus to ensure that an empty report does not add the status from currentStatus passed to it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and done

}

// TestTrafficPolicyClearStaleStatus verifies that stale status is cleared when targetRef becomes invalid
func (s *testingSuite) TestTrafficPolicyClearStaleStatus() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test these using translator tests or a new suite for statuses? This test is not specific to the Cors feature.

Copy link
Contributor Author

@alexliu541 alexliu541 Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will move to a new suite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we should consider moving the e2e tests for stale policy and route status to a standalone status suite or to a "scenario" style e2e test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I see the updated structure. I think this makes sense as defined in this PR for the policy tests.

I still think we should reevaluate where the HTTPRoute stale status test lives, but unrelated to this PR.

@alexliu541 alexliu541 force-pushed the alexliu541/policystatus branch from 2b45988 to a0d74fe Compare November 21, 2025 18:11
}

// TestTrafficPolicyClearStaleStatus verifies that stale status is cleared when targetRef becomes invalid
func (s *testingSuite) TestTrafficPolicyClearStaleStatus() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I see the updated structure. I think this makes sense as defined in this PR for the policy tests.

I still think we should reevaluate where the HTTPRoute stale status test lives, but unrelated to this PR.

s.commonCols.Routes.ProcessHTTPRouteStatusMarkers(objStatus, merged)

for _, plugin := range s.plugins.ContributesPolicies {
if plugin.ProcessPolicyStaleStatusMarkers != nil && plugin.ProcessBackend == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this done to exclude backend policy plugins?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, backend policy and traffic policies uses different status reporting queue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a small concern with this approach, if we have plugins that contribute both backends and policy will we potentially lose this functionality?

Wondering how we can make this more explicit.

One option would be to have an explicit processBackendPolicyMarkers vs. processPolicyMarkers.

What do you think? Is that overkill?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the functionality will be fine as long as we make sure to report / generate its status in either backend or general status report, and given its gonna contribute to backend most likely it will be reported as a part of backend status reporting, which is what I have here.

We might need to have explicit processBackendPolicyMarkers vs processPolicyMarkers and detailed if check in that case, but right now I didn't want to add more fields into the plugin unless unnecessary, so I went with a check here. Not entirely a fan of checking the ProcessBackend either so if you think have separate is better we can do that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your assessment.

Given it's not likely we'll hit this specific scenario anytime soon, I think we can move forward with this as-is.

Signed-off-by: Alexander Liu <[email protected]>
Signed-off-by: Alexander Liu <[email protected]>
@lgadban lgadban self-requested a review November 25, 2025 02:57
Copy link
Contributor

@lgadban lgadban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks!

I'll approve with one minor question.
Let's chat tomorrow and either merge or follow up.

s.commonCols.Routes.ProcessHTTPRouteStatusMarkers(objStatus, merged)

for _, plugin := range s.plugins.ContributesPolicies {
if plugin.ProcessPolicyStaleStatusMarkers != nil && plugin.ProcessBackend == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a small concern with this approach, if we have plugins that contribute both backends and policy will we potentially lose this functionality?

Wondering how we can make this more explicit.

One option would be to have an explicit processBackendPolicyMarkers vs. processPolicyMarkers.

What do you think? Is that overkill?

@lgadban lgadban added this pull request to the merge queue Nov 26, 2025
Merged via the queue into kgateway-dev:main with commit 33d2bad Nov 26, 2025
30 checks passed
@alexliu541 alexliu541 deleted the alexliu541/policystatus branch November 26, 2025 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/fix Categorizes issue or PR as related to a bug. release-note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fail-open behavior and incorrect status reporting for policy attachment errors

4 participants