Skip to content

Conversation

@josh-pritchard
Copy link
Contributor

@josh-pritchard josh-pritchard commented Dec 2, 2025

Description

Some cleanup and proactive bug fixes with handling of controllerNames and enabled flags for Envoy and agentgateway. This is in preparation to add agentgateway specific gateway parameters & independent install paths so now these need to reliably disable when not enabled if they are needed to run side by side. New tests will help confirm nothing breaks with more intrusive changes to come.

Change Type

/kind cleanup

Changelog

Removed enabled from agentgateway in GatewayParameters as it should only use controllerName to know if its agentgateway or envoy

Additional Notes

Resolves #13023

Copilot AI review requested due to automatic review settings December 2, 2025 02:19
@gateway-bot gateway-bot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note labels Dec 2, 2025
@josh-pritchard josh-pritchard force-pushed the always-respect-controller-name branch from a1f9e20 to e70276a Compare December 2, 2025 02:20
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the Enabled flag from the Agentgateway struct in GatewayParameters and establishes controllerName from the GatewayClass as the authoritative source for determining whether a gateway uses the envoy or agentgateway data plane. This cleanup simplifies the API and prevents configuration inconsistencies.

Key changes:

  • Removed Agentgateway.Enabled field from the API and replaced its usage with controllerName-based detection
  • Updated defaultAgentgatewayParameters() to populate agentgateway-specific configuration only for agentgateway controllers
  • Enhanced gateway controller reconciliation logic to respect enable flags for envoy and agentgateway controllers

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
api/v1alpha1/gateway_parameters_types.go Removed Enabled field from Agentgateway struct and updated documentation to clarify that controllerName determines gateway type
api/v1alpha1/zz_generated.deepcopy.go Auto-generated code updated to remove deepcopy logic for the removed Enabled field
install/helm/kgateway-crds/templates/gateway.kgateway.dev_gatewayparameters.yaml Removed enabled field from CRD schema and updated field descriptions
pkg/deployer/merge.go Removed merging of Enabled field from deepMergeAgentgateway()
pkg/deployer/gateway_parameters.go Refactored defaultAgentgatewayParameters() to build agentgateway config instead of just toggling a flag; removed agentgateway config from defaultGatewayParameters()
pkg/deployer/gateway_parameters_test.go Updated test assertions to check for presence of agentgateway config rather than enabled flag
pkg/deployer/deployer_test.go Added AgentgatewayControllerName field to test inputs for consistency
internal/kgateway/deployer/gateway_parameters.go Updated getValues() to determine gateway type using irGW.ControllerName comparison instead of checking enabled flag
internal/kgateway/deployer/gateway_parameters_test.go Removed Enabled: ptr.To(false) from test data
internal/kgateway/controller/controller.go Added EnableEnvoy and EnableAgentgateway fields to GatewayConfig
internal/kgateway/controller/gw_controller.go Added reconciliation logic to skip gateways for disabled controllers based on enable flags
internal/kgateway/controller/start.go Passed enable flags from global settings to gateway controller configuration
internal/kgateway/proxy_syncer/proxy_syncer.go Replaced manual filtering logic with comment explaining that filtering happens upstream in the gateway collection
test/deployer/testutils.go Added both controller names to ControllerNames set and populated ControllerName/AgentgatewayControllerName fields in CommonCollections

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Joshua Pritchard <[email protected]>
@josh-pritchard josh-pritchard force-pushed the always-respect-controller-name branch from 3bb058f to ed7a10b Compare December 2, 2025 05:00
Signed-off-by: Joshua Pritchard <[email protected]>
@josh-pritchard josh-pritchard changed the title Audit consistent use of of controllerNames and their enabled flags Audit consistent use of of controllerNames and their enabled flags w/ test to verify Dec 2, 2025
@josh-pritchard josh-pritchard changed the title Audit consistent use of of controllerNames and their enabled flags w/ test to verify Audit consistent use of controllerNames and their enabled flags w/ test to verify Dec 2, 2025
@josh-pritchard josh-pritchard force-pushed the always-respect-controller-name branch from 9ac0561 to 5bba48e Compare December 2, 2025 22:44

The `upgradeHelmWithFlags()` method:
1. Gets local chart path
2. Sets `envoy.enabled` and `agentgateway.enabled` flags (which map to `KGW_ENABLE_ENVOY` and `KGW_ENABLE_AGENTGATEWAY` env vars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're still using KGW_ENABLE_AGENTGATEWAY for the dual controller mode?

gateways := krtcollections.NewGatewayIndex(gatewayIndexConfig)
commonCols := &collections.CommonCollections{
GatewayIndex: gateways,
GatewayIndex: gateways,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these gateways filtered out based on the controller name? Will the dual controller have both gateways in the collection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no its just a mock collection, I actually had to work around this with some custom check in the test as the built in checks we have were expecting it to come up. I don't think its a big deal as we likely are not testing this in other suites

return nil
}

// Note: s.commonCols.GatewayIndex.Gateways is already filtered to only include Gateways
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work with the dual controller? Or is common collection GatewayIndex only for the envoy-based Gateways?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes for agw its done here
image

@josh-pritchard
Copy link
Contributor Author

/kick-ci

@josh-pritchard
Copy link
Contributor Author

/retest

Signed-off-by: Joshua Pritchard <[email protected]>
AGENTS.md Outdated
- **Agentgateway Controller**: `kgateway.dev/agentgateway` (defined in `wellknown.DefaultAgwControllerName`)

**Critical Requirements:**
1. Controllers MUST respect `GatewayClass.spec.controllerName` - NOT the GatewayClass name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GatewayClass.spec.controllerName takes precedence, but the GatewayClass name can matter too, as in the case of the waypoint class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to also update the deployer to check if the flags are enabled since we're currently relying on the gateway class?

if gatewayClassName == d.agwGatewayClassName {

// Each test method will manually apply its manifests after helm upgrade
}

// AfterTest overrides the base suite's AfterTest for manual cleanup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: delete this func?

}

// Merge with existing extra args from test installation, but filter out conflicting flags
// ExtraHelmArgs comes in pairs: "--set" followed by "key=value"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can also just say --set=foo=bar

ValuesFiles: []string{s.TestInstallation.Metadata.ProfileValuesManifestFile, s.TestInstallation.Metadata.ValuesManifestFile},
ReleaseName: helmutils.ChartName,
ChartUri: chartUri,
ExtraArgs: extraArgs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI you could say 'helm --wait [--timeout=N]' and catch some 'controller pod is erroring persistently' issues with less code than EventuallyPodsRunning(controller).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left it just to be consistent with framework wide timeout settings

s.TestInstallation.Assertions.Gomega.Eventually(func(g gomega.Gomega) {
gc := &gwv1.GatewayClass{}
err := s.TestInstallation.ClusterContext.Client.Get(s.Ctx, client.ObjectKey{
Name: "kgateway",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the waypoint class could also be tested

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its an alpha feature right now we are not moving forward for now, either way it will work fine as when EnableEnvoy=false then no envoy gateway translation collection is created, ProxySyncer doesn't run, plugins don't execute for envoy gateways and waypoint functionality is all done through plugins


### Run the entire suite
```bash
go test -v -timeout 30m -tags e2e ./test/e2e/tests/... -run "^TestDualController$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ./hack/run-test.sh TestDualController is how I'd run the suite outside of CI. It'll create a kind cluster, or not, build docker images, etc.

go test -v -timeout 30m -tags e2e ./test/e2e/tests/... -run "^TestDualController$"
```

### Run specific test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: delete the rest of the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah idk why I committed this .md, it was AI slop

// waitForGatewayReady waits for the gateway to be fully ready before making HTTP requests.
// This prevents flakes where requests fail with "Connection refused" because the gateway
// isn't ready yet.
func (s *testingSuite) waitForGatewayReady() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have this function instead of doing this once in SetupTest() (it's compatible with BeforeTest but if BaseTestingSuite skips the test, SetupTest() will have already run)?

But also, why single out this e2e test suite? We have a minority of suites that wait the the GW to be programmed and I could imagine doing it in BaseTestingSuite so that it's done by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, I singled it out because it flaked on my PR

Copy link
Contributor Author

@josh-pritchard josh-pritchard Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because I actually found a number of other tests doing similar checks like this. This test is pretty new so i figured it was a real flake because it also was not following patterns I found elsewhere in agentgateway tests

func defaultAgentgatewayParameters(imageInfo *ImageInfo, omitDefaultSecurityContext bool) *kgateway.GatewayParameters {
gwp := defaultGatewayParameters(imageInfo, omitDefaultSecurityContext)
gwp.Spec.Kube.Agentgateway.Enabled = ptr.To(true)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the author of #13018, I want to understand this block better. Seems like we don't need this block or the test/deployer test cases would change with its addition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah; you moved it in this file. You moved it because it was only in use if this function was using it, and because without 'Enabled' it smelled a little funny remaining. There's no point in having the same thing in both places. Looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it would have been dead code here

Registry: ptr.To(AgentgatewayRegistry),
Tag: ptr.To(AgentgatewayDefaultTag),
Repository: ptr.To(AgentgatewayImage),
PullPolicy: (*corev1.PullPolicy)(ptr.To(imageInfo.PullPolicy)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to default to omitting the pull policy so that k8s can use it's unsurprising logic about 'latest means Always, and otherwise...' (see https://kubernetes.io/docs/concepts/containers/images/#imagepullpolicy-defaulting).

This is confusing and maybe even casts in the wrong place, but also turns the empty string into "something" when I'd prefer it were nil. There's also validation to consider since we're using a string in ImageInfo, not corev1.PullPolicy. Perhaps use this (which will never error because imageInfo is something we control completely):

func ToPullPolicyPtr(input string) (*corev1.PullPolicy, error) {
    policy := corev1.PullPolicy(input)

    switch policy {
    case corev1.PullAlways, corev1.PullIfNotPresent, corev1.PullNever:
        return ptr.To(policy), nil
    case "":
        // let K8s nuanced default apply (see https://kubernetes.io/docs/concepts/containers/images/#imagepullpolicy-defaulting)
        return nil, nil
    default:
        return nil, fmt.Errorf("invalid image pull policy: %q", input)
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm; you just moved this and it's working well enough with a pointer to the empty string because both data plane helm charts say '{{- if $gateway.image.pullPolicy }}'. I don't love the cast on the outside but corev1.PullPolicy is a string like imageInfo.PullPolicy...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I agree, maybe a followup?


krtopts := krtutil.NewKrtOptions(ctx.Done(), nil)

gatewayIndexConfig := krtcollections.GatewayIndexConfig{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done in a follow up, but we should also filter Gateways on the agentgateway collection side: https://github.com/kgateway-dev/kgateway/blob/main/pkg/agentgateway/plugins/collection.go#L156. This will make the policy attachment cleaner (limit what gateways you can attach to based on the controller now that dual mode is not going to be supported after the helm split)

Signed-off-by: Joshua Pritchard <[email protected]>
@josh-pritchard josh-pritchard force-pushed the always-respect-controller-name branch from 9db369b to 5b2eb14 Compare December 5, 2025 14:21
@npolshakova npolshakova added this pull request to the merge queue Dec 5, 2025
Merged via the queue into kgateway-dev:main with commit d4fc1e9 Dec 5, 2025
30 checks passed
chandler-solo added a commit to chandler-solo/kgateway that referenced this pull request Dec 8, 2025
… only way

to configure agentgateway. And after PR kgateway-dev#13017 (commit
d4fc1e9), 'disabling' agentgateway is probably
nonsensical.

Signed-off-by: David L. Chandler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Audit consistent use of of controllerNames and their enabled flags

4 participants