-
Notifications
You must be signed in to change notification settings - Fork 631
Audit consistent use of controllerNames and their enabled flags w/ test to verify #13017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit consistent use of controllerNames and their enabled flags w/ test to verify #13017
Conversation
… and enabled flags Signed-off-by: Joshua Pritchard <[email protected]>
a1f9e20 to
e70276a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR removes the Enabled flag from the Agentgateway struct in GatewayParameters and establishes controllerName from the GatewayClass as the authoritative source for determining whether a gateway uses the envoy or agentgateway data plane. This cleanup simplifies the API and prevents configuration inconsistencies.
Key changes:
- Removed
Agentgateway.Enabledfield from the API and replaced its usage with controllerName-based detection - Updated
defaultAgentgatewayParameters()to populate agentgateway-specific configuration only for agentgateway controllers - Enhanced gateway controller reconciliation logic to respect enable flags for envoy and agentgateway controllers
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| api/v1alpha1/gateway_parameters_types.go | Removed Enabled field from Agentgateway struct and updated documentation to clarify that controllerName determines gateway type |
| api/v1alpha1/zz_generated.deepcopy.go | Auto-generated code updated to remove deepcopy logic for the removed Enabled field |
| install/helm/kgateway-crds/templates/gateway.kgateway.dev_gatewayparameters.yaml | Removed enabled field from CRD schema and updated field descriptions |
| pkg/deployer/merge.go | Removed merging of Enabled field from deepMergeAgentgateway() |
| pkg/deployer/gateway_parameters.go | Refactored defaultAgentgatewayParameters() to build agentgateway config instead of just toggling a flag; removed agentgateway config from defaultGatewayParameters() |
| pkg/deployer/gateway_parameters_test.go | Updated test assertions to check for presence of agentgateway config rather than enabled flag |
| pkg/deployer/deployer_test.go | Added AgentgatewayControllerName field to test inputs for consistency |
| internal/kgateway/deployer/gateway_parameters.go | Updated getValues() to determine gateway type using irGW.ControllerName comparison instead of checking enabled flag |
| internal/kgateway/deployer/gateway_parameters_test.go | Removed Enabled: ptr.To(false) from test data |
| internal/kgateway/controller/controller.go | Added EnableEnvoy and EnableAgentgateway fields to GatewayConfig |
| internal/kgateway/controller/gw_controller.go | Added reconciliation logic to skip gateways for disabled controllers based on enable flags |
| internal/kgateway/controller/start.go | Passed enable flags from global settings to gateway controller configuration |
| internal/kgateway/proxy_syncer/proxy_syncer.go | Replaced manual filtering logic with comment explaining that filtering happens upstream in the gateway collection |
| test/deployer/testutils.go | Added both controller names to ControllerNames set and populated ControllerName/AgentgatewayControllerName fields in CommonCollections |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Joshua Pritchard <[email protected]>
3bb058f to
ed7a10b
Compare
Signed-off-by: Joshua Pritchard <[email protected]>
Signed-off-by: Joshua Pritchard <[email protected]>
Signed-off-by: Joshua Pritchard <[email protected]>
9ac0561 to
5bba48e
Compare
|
|
||
| The `upgradeHelmWithFlags()` method: | ||
| 1. Gets local chart path | ||
| 2. Sets `envoy.enabled` and `agentgateway.enabled` flags (which map to `KGW_ENABLE_ENVOY` and `KGW_ENABLE_AGENTGATEWAY` env vars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're still using KGW_ENABLE_AGENTGATEWAY for the dual controller mode?
| gateways := krtcollections.NewGatewayIndex(gatewayIndexConfig) | ||
| commonCols := &collections.CommonCollections{ | ||
| GatewayIndex: gateways, | ||
| GatewayIndex: gateways, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these gateways filtered out based on the controller name? Will the dual controller have both gateways in the collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no its just a mock collection, I actually had to work around this with some custom check in the test as the built in checks we have were expecting it to come up. I don't think its a big deal as we likely are not testing this in other suites
| return nil | ||
| } | ||
|
|
||
| // Note: s.commonCols.GatewayIndex.Gateways is already filtered to only include Gateways |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this work with the dual controller? Or is common collection GatewayIndex only for the envoy-based Gateways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Joshua Pritchard <[email protected]>
|
/kick-ci |
|
/retest |
Signed-off-by: Joshua Pritchard <[email protected]>
AGENTS.md
Outdated
| - **Agentgateway Controller**: `kgateway.dev/agentgateway` (defined in `wellknown.DefaultAgwControllerName`) | ||
|
|
||
| **Critical Requirements:** | ||
| 1. Controllers MUST respect `GatewayClass.spec.controllerName` - NOT the GatewayClass name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GatewayClass.spec.controllerName takes precedence, but the GatewayClass name can matter too, as in the case of the waypoint class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to also update the deployer to check if the flags are enabled since we're currently relying on the gateway class?
kgateway/pkg/deployer/deployer.go
Line 301 in a317fe8
| if gatewayClassName == d.agwGatewayClassName { |
| // Each test method will manually apply its manifests after helm upgrade | ||
| } | ||
|
|
||
| // AfterTest overrides the base suite's AfterTest for manual cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: delete this func?
| } | ||
|
|
||
| // Merge with existing extra args from test installation, but filter out conflicting flags | ||
| // ExtraHelmArgs comes in pairs: "--set" followed by "key=value" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can also just say --set=foo=bar
| ValuesFiles: []string{s.TestInstallation.Metadata.ProfileValuesManifestFile, s.TestInstallation.Metadata.ValuesManifestFile}, | ||
| ReleaseName: helmutils.ChartName, | ||
| ChartUri: chartUri, | ||
| ExtraArgs: extraArgs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI you could say 'helm --wait [--timeout=N]' and catch some 'controller pod is erroring persistently' issues with less code than EventuallyPodsRunning(controller).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left it just to be consistent with framework wide timeout settings
| s.TestInstallation.Assertions.Gomega.Eventually(func(g gomega.Gomega) { | ||
| gc := &gwv1.GatewayClass{} | ||
| err := s.TestInstallation.ClusterContext.Client.Get(s.Ctx, client.ObjectKey{ | ||
| Name: "kgateway", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the waypoint class could also be tested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its an alpha feature right now we are not moving forward for now, either way it will work fine as when EnableEnvoy=false then no envoy gateway translation collection is created, ProxySyncer doesn't run, plugins don't execute for envoy gateways and waypoint functionality is all done through plugins
|
|
||
| ### Run the entire suite | ||
| ```bash | ||
| go test -v -timeout 30m -tags e2e ./test/e2e/tests/... -run "^TestDualController$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ./hack/run-test.sh TestDualController is how I'd run the suite outside of CI. It'll create a kind cluster, or not, build docker images, etc.
| go test -v -timeout 30m -tags e2e ./test/e2e/tests/... -run "^TestDualController$" | ||
| ``` | ||
|
|
||
| ### Run specific test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: delete the rest of the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah idk why I committed this .md, it was AI slop
| // waitForGatewayReady waits for the gateway to be fully ready before making HTTP requests. | ||
| // This prevents flakes where requests fail with "Connection refused" because the gateway | ||
| // isn't ready yet. | ||
| func (s *testingSuite) waitForGatewayReady() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have this function instead of doing this once in SetupTest() (it's compatible with BeforeTest but if BaseTestingSuite skips the test, SetupTest() will have already run)?
But also, why single out this e2e test suite? We have a minority of suites that wait the the GW to be programmed and I could imagine doing it in BaseTestingSuite so that it's done by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, I singled it out because it flaked on my PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because I actually found a number of other tests doing similar checks like this. This test is pretty new so i figured it was a real flake because it also was not following patterns I found elsewhere in agentgateway tests
| func defaultAgentgatewayParameters(imageInfo *ImageInfo, omitDefaultSecurityContext bool) *kgateway.GatewayParameters { | ||
| gwp := defaultGatewayParameters(imageInfo, omitDefaultSecurityContext) | ||
| gwp.Spec.Kube.Agentgateway.Enabled = ptr.To(true) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the author of #13018, I want to understand this block better. Seems like we don't need this block or the test/deployer test cases would change with its addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah; you moved it in this file. You moved it because it was only in use if this function was using it, and because without 'Enabled' it smelled a little funny remaining. There's no point in having the same thing in both places. Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it would have been dead code here
| Registry: ptr.To(AgentgatewayRegistry), | ||
| Tag: ptr.To(AgentgatewayDefaultTag), | ||
| Repository: ptr.To(AgentgatewayImage), | ||
| PullPolicy: (*corev1.PullPolicy)(ptr.To(imageInfo.PullPolicy)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to default to omitting the pull policy so that k8s can use it's unsurprising logic about 'latest means Always, and otherwise...' (see https://kubernetes.io/docs/concepts/containers/images/#imagepullpolicy-defaulting).
This is confusing and maybe even casts in the wrong place, but also turns the empty string into "something" when I'd prefer it were nil. There's also validation to consider since we're using a string in ImageInfo, not corev1.PullPolicy. Perhaps use this (which will never error because imageInfo is something we control completely):
func ToPullPolicyPtr(input string) (*corev1.PullPolicy, error) {
policy := corev1.PullPolicy(input)
switch policy {
case corev1.PullAlways, corev1.PullIfNotPresent, corev1.PullNever:
return ptr.To(policy), nil
case "":
// let K8s nuanced default apply (see https://kubernetes.io/docs/concepts/containers/images/#imagepullpolicy-defaulting)
return nil, nil
default:
return nil, fmt.Errorf("invalid image pull policy: %q", input)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm; you just moved this and it's working well enough with a pointer to the empty string because both data plane helm charts say '{{- if $gateway.image.pullPolicy }}'. I don't love the cast on the outside but corev1.PullPolicy is a string like imageInfo.PullPolicy...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I agree, maybe a followup?
|
|
||
| krtopts := krtutil.NewKrtOptions(ctx.Done(), nil) | ||
|
|
||
| gatewayIndexConfig := krtcollections.GatewayIndexConfig{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done in a follow up, but we should also filter Gateways on the agentgateway collection side: https://github.com/kgateway-dev/kgateway/blob/main/pkg/agentgateway/plugins/collection.go#L156. This will make the policy attachment cleaner (limit what gateways you can attach to based on the controller now that dual mode is not going to be supported after the helm split)
Signed-off-by: Joshua Pritchard <[email protected]>
9db369b to
5b2eb14
Compare
… only way to configure agentgateway. And after PR kgateway-dev#13017 (commit d4fc1e9), 'disabling' agentgateway is probably nonsensical. Signed-off-by: David L. Chandler <[email protected]>

Description
Some cleanup and proactive bug fixes with handling of controllerNames and enabled flags for Envoy and agentgateway. This is in preparation to add agentgateway specific gateway parameters & independent install paths so now these need to reliably disable when not enabled if they are needed to run side by side. New tests will help confirm nothing breaks with more intrusive changes to come.
Change Type
/kind cleanup
Changelog
Additional Notes
Resolves #13023