Skip to content

KEP-5311: Relaxed validation for Services names #5315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

adrianmoisey
Copy link
Member

@adrianmoisey adrianmoisey commented May 18, 2025

  • One-line PR description: Relaxed validation for Services names
  • Other comments:

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 18, 2025
@k8s-ci-robot k8s-ci-robot requested a review from aojea May 18, 2025 19:08
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label May 18, 2025
@k8s-ci-robot k8s-ci-robot requested a review from danwinship May 18, 2025 19:08
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 18, 2025
@adrianmoisey adrianmoisey force-pushed the relaxed-name-validation-service branch from b4d56a5 to cd35944 Compare May 19, 2025 10:57
## Summary

This KEP proposes a relaxation of the Service name in order bring it in line
with the validation requirements of other resources in Kubernetes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with the validation requirements of other resources in Kubernetes.
with the validation requirements of names of other resources in Kubernetes.


- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/e2e/...): [SIG ...](https://testgrid.k8s.io/sig-...?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature)

### Graduation Criteria
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should talk more here about the validation you talked about under "Risks and Mitigations". At what point will that happen?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, I don't know. I may need some guidance here.
The risks internal to k/k will be handled with integration and e2e tests (ie: the Ingress resource)

As for the risks external to k/k (consumers of Services and Ingress resources), I'm not really sure. Is there something this KEP should call out?
I remember seeing another KEP mentioning waiting for enough clouds to get onboard with the idea before graduating to GA, so I assume something similar needs to happen here, but I'm not really sure how to reasonability approach this.

- Verifying that DNS providers used in Kubernetes clusters can handle the new service name format.
- Running integration tests to ensure that dependent components such as Ingress controllers and service discovery mechanisms function correctly with the updated validation.
2. The Ingress resource references Service and will also need a relaxed validation on its reference to the Service
3. Downstream applications may perform validation on fields relating to Service names (ie: [ingress-ngnix](https://github.com/kubernetes/ingress-nginx/blob/d3ab5efd54f38f2b7c961024553b0ad060e2e916/internal/ingress/annotations/parser/validators.go#L190-L197)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should this risk be mitigated?
For ingress-nginx I'm happy to make a PR to change its validation, but that assumes it will one day bump to using k8s.io/apimachinery 0.34.0 (or higher).

What about the unknowns? There could be other service meshes, Ingress controllers, etc that are doing their own validation. Is it in scope of this KEP to identify those and get them fixed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the code, it says that the value of the default-backend annotation has to be a DNS1035Label. So that just means that if you create a Service with a DNS1123Label name, you won't be able to refer to that Service from an ingress-nginx default-backend annotation (until they change the ingress-nginx validation). So this isn't really a big deal; you can just not use a 1123 name in that case.

doing their own validation. Is it in scope of this KEP to identify those and get them fixed?

If there are problems, they're probably not going to be because of people doing conflicting validation, they're going to be because of people putting Service names into contexts where an initial digit would screw things up.

Like, for instance, if you write out a YAML file by hand, and put the service name into a field, currently it is guaranteed that that field would get parsed as a string. But with this KEP, it would be possible to have a service whose name was entirely numeric, so if you wrote that out, the consumer of the YAML would find an integer value there rather than a string value and probably break.

Yeah, yeah, you could break it before with a service named "true" too. Yay YAML. Anyway, you get the idea though. People might use service names in contexts where the newly-allowed characters could cause problems.

And there's no way you could find all such places, since this could just be in a script that some random k8s admin somewhere wrote.

So, this is definitely something to worry about, but in the end, we have to just be like "eh... probably not going to happen".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding the code, it says that the value of the default-backend annotation has to be a DNS1035Label. So that just means that if you create a Service with a DNS1123Label name, you won't be able to refer to that Service from an ingress-nginx default-backend annotation (until they change the ingress-nginx validation). So this isn't really a big deal; you can just not use a 1123 name in that case.

🤦 You're right. I found the function and just assumed how it would be used, without actually reading the context. Thanks for making up for my laziness :/

So, this is definitely something to worry about, but in the end, we have to just be like "eh... probably not going to happen".

If I'm hearing you right, while a concern, it's really out of scope of this KEP?

@adrianmoisey adrianmoisey force-pushed the relaxed-name-validation-service branch 2 times, most recently from 18f8bf6 to 3ba04c6 Compare May 28, 2025 19:11
@adrianmoisey adrianmoisey marked this pull request as ready for review May 28, 2025 19:11
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2025
@k8s-ci-robot k8s-ci-robot requested review from aojea and thockin May 28, 2025 19:11
@adrianmoisey adrianmoisey force-pushed the relaxed-name-validation-service branch 2 times, most recently from dcec20b to 049bcb9 Compare May 29, 2025 07:00
@adrianmoisey
Copy link
Member Author

I think this is in a good place for another review.
I struggle a little with the order of operation of things like the PRR and all that.

- Creation of Services will use the previous `NameIsDNS1035Label()` validation for `.metadata.name`
- Updates of Services doesn't change, as the `metada.name` field is immutable
- Creation of Ingress will use the previous `NameIsDNS1035Label()` validation for `.spec.[]rules.http.[]paths.backend.service.name`
- Updates of Ingress will use the new `NameIsDNSLabel()` validation for `.spec.[]rules.http.[]paths.backend.service.name` if that field changes, otherwise, there is no validation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Updates of Ingress will use the new `NameIsDNSLabel()` validation for `.spec.[]rules.http.[]paths.backend.service.name` if that field changes, otherwise, there is no validation
- Updates of Ingress will use the previous `NameIsDNS1035Label()` validation for `.spec.[]rules.http.[]paths.backend.service.name` if that field changes, otherwise, there is no validation

What this is talking about is:

  • User upgrades to 1.34 and enables the feature gate.
  • User creates a new Service 5311-kep-example and an new Ingress kep-example that points to that Service via .spec.[]rules.http.[]paths.backend.service.name
  • User disables the feature gate
  • At this point, Services can't have non-1035 names, and Ingresses aren't allowed to refer to non-1035 names, but the user already has a non-1035-named Service and an Ingress that refers to it, and we don't want them to just break. So:
    • We let the user make updates to the 5311-kep-example Service, even though that Service would be considered invalid if they tried to create it now.
    • We let the user make updates to the kep-example Ingress as long as they don't modify the rule referring to 5311-kep-example.
    • We let the user update the kep-example Ingress to fix the 5311-kep-example rule to refer to kep-example instead (because we would validate the modified rule with NameIsDNS1035Label, and find it valid).
    • We don't let the user update the kep-example Ingress to add a new rule referring to 5311-kep-example (because we would validate the new rule with NameIsDNS1035Label and find it invalid).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual pattern is to check the gate in pkg/registry/.../strategy.go and pass a flag or "opts" struct into the Validation function. Then validation checks the flag/opts and changes its behavior based on that. Then you can test both paths easily.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks. I think I had assumed that once a field was using the new validation, it should continue to use that validation, but ye, I see what you're saying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify - Service name is immutable so you can just skip validation of it on updates. The field referenced here SHOULD continue to use the original validation. We don't want saved objects to suddenly become invalid.

I said "check the gate in pkg/registry/.../strategy.go" what I really meant was to apply the usual logic of "if the gate is set OR the field is already using the gated functionality". Put another way: if the gate is off, the door is shut, but anyone who is already inside is allowed to remain inside. :)

In declarative validation this is being implemented as "if newVal == oldVal, skip validation entirely". But that requires plumbing oldVal into every validation function. In most handwritten code it becomes somethign like:

opts.ServiceNameRelaxed = <read feature gate>
if isValidDNSLabel(field) && !isValidDNS1035Label) {
    opts.ServiceNameRelaxed = true
}

Copy link
Member Author

@adrianmoisey adrianmoisey Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My response was to Dan, about Ingress.spec.rules[].http.paths[].backend.service.name.
I'm unsure if we're crossing threads here.

I guess what I'm now not clear about is the Ingress.spec.rules[].http.paths[].backend.service.name field.

Let's assume that a user turns the feature gate on, sets that field to "1-adrian", then turns the feature gate off. Now they edit it again, and change the field to "2-adrian", is that situation allowed here?

I think what I'm hearing Dan say is "During Ingress updates, only do validation if that field changes, otherwise do no validation", and what I'm hearing Tim say is "During Ingress updates , do validation on that field. If it matches the new validation, then validate using that".

Apologies if I'm not catching what you're saying!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the existing code, the easiest path forward will be "If this Ingress resource currently uses the new validation on any of its service refs, allow that Ingress resource to continue validated using the new validation as a whole".
I think this is what Tim is saying?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's assume that a user turns the feature gate on, sets that field to "1-adrian", then turns the feature gate off. Now they edit it again, and change the field to "2-adrian", is that situation allowed here?

While we should say "no, only the existing value", it's just not worth the effort to prevent that, so we usually do not bother.

the easiest path forward will be "If this Ingress resource currently uses the new validation on any of its service refs, allow that Ingress resource to continue validated using the new validation...

Right - finer granularity than that is not super important. As declarative validdation comes on-line, we will get a lot of that for free.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that makes sense. I've implemented it that was in the code I'm working on.

Should I update the KEP to also say that if an Ingress resource contains a new-validated Service name, then the entire Ingress resource is eligible to use the new-validated Service name?

When the feature gate is disabled:

- Creation of Services will use the previous `NameIsDNS1035Label()` validation for `.metadata.name`
- Updates of Services doesn't change, as the `metada.name` field is immutable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Updates of Services doesn't change, as the `metada.name` field is immutable
- Updates of Services will no longer validate `metadata.name`, since the field is immutable, so the existing value can be assumed to be valid.

I said before that you "didn't need to worry about" this case, but that's not true; the current code always validates the name, and you need to change it so that it only validates the name on Creates, and skips re-validating it on Updates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like ValidateServiceUpdate() calls ValidateService(). You can pass a flag, e.g. isUpdate bool and change name validation to pass a do-nothing functionm, like:

nameFn := ValidateServiceName,
if isUpdate {
    nameFn = func(_ string, _ bool) []string { return nil }
}

Describe manual testing that was done and the outcomes.
Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

("No" is a legitimate answer here for simple KEPs.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this one unanswered, as it was under the "Rollout, Upgrade and Rollback Planning" heading, which has the comment "This section must be completed when targeting beta to a release."

So I plan on doing the manual testing after writing the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to leave it no for now, for beta you'll need to put here a description of upgrade->downgrade->upgrade manual testing.

@thockin thockin self-assigned this Jun 5, 2025
Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a few details to Dan's good comments.

When the feature gate is disabled:

- Creation of Services will use the previous `NameIsDNS1035Label()` validation for `.metadata.name`
- Updates of Services doesn't change, as the `metada.name` field is immutable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like ValidateServiceUpdate() calls ValidateService(). You can pass a flag, e.g. isUpdate bool and change name validation to pass a do-nothing functionm, like:

nameFn := ValidateServiceName,
if isUpdate {
    nameFn = func(_ string, _ bool) []string { return nil }
}

- Creation of Services will use the previous `NameIsDNS1035Label()` validation for `.metadata.name`
- Updates of Services doesn't change, as the `metada.name` field is immutable
- Creation of Ingress will use the previous `NameIsDNS1035Label()` validation for `.spec.[]rules.http.[]paths.backend.service.name`
- Updates of Ingress will use the new `NameIsDNSLabel()` validation for `.spec.[]rules.http.[]paths.backend.service.name` if that field changes, otherwise, there is no validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usual pattern is to check the gate in pkg/registry/.../strategy.go and pass a flag or "opts" struct into the Validation function. Then validation checks the flag/opts and changes its behavior based on that. Then you can test both paths easily.

If e2e tests are not necessary or useful, explain why.
-->

- Create a Service that requires the new validation and test if a DNS lookup works for it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need integration AND validation for this? We don't generally exercise name validation in e2e (so expensive).

@aojea

Copy link
Member

@aojea aojea Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the sentence as it will be a single e2e test, that will create a Service with the new name format and perform a DNS lookup, so the validation will be tested implicitly not as a different test. We need this test to be promoted to conformance or we get the risk of new Service names will not be processed by DNS implementers.

@adrianmoisey adrianmoisey force-pushed the relaxed-name-validation-service branch from 049bcb9 to df40367 Compare June 6, 2025 15:02
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 6, 2025
@adrianmoisey adrianmoisey force-pushed the relaxed-name-validation-service branch from df40367 to 56514e8 Compare June 6, 2025 15:11
### Risks and Mitigations

1. Services are responsible for creating DNS records (ie: `<service-name>.<namespace>.svc.cluster.local>`). To confirm that downstream systems will support the new validation, we will conduct compatibility testing by:
- Verifying that DNS providers used in Kubernetes clusters can handle the new service name format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a Conformance test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aojea
Copy link
Member

aojea commented Jun 9, 2025

/lgtm

only comments for clarifications and suggestions on the tests implementations

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 9, 2025
Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm... not sure if Antonio wanted changes now or later?

@adrianmoisey adrianmoisey force-pushed the relaxed-name-validation-service branch from 56514e8 to 49ac61e Compare June 9, 2025 15:04
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 9, 2025
@adrianmoisey adrianmoisey force-pushed the relaxed-name-validation-service branch from 49ac61e to 4e4f675 Compare June 9, 2025 15:12
@thockin
Copy link
Member

thockin commented Jun 9, 2025

/approve for API - Dan and Antonio have the LGTM

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 9, 2025
@adrianmoisey
Copy link
Member Author

/assign soltysh

(For PRR)

@danwinship
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 10, 2025
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments, but this is mostly ok

Services that depend on the new validation, since any version of Kubernetes with
this feature gate will allow existing Services to have `NameIsDNSLabel` names. If a
user rolls back to a version before the feature gate, and has Services that depend
on the new validation, they will be unable to modify those services, and will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This not entirely true, is it? We only validate service when creating, so theoretically they should function. One won't be able to create the services which rely on relaxed validation? Right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, currently ValidateServiceCreate and ValidateServiceUpdate use a shared ValidateService function that mostly doesn't pay attention to the distinction between creates and updates.,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense.


There are no metrics that would inform anyone that the feature was
failing, but since the feature is opt-in, individual users can simply
stop using the feature if it is not working for them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there's no explicit metric, but one might suggest looking at increased level of failed Service creations, which may signal a problem with mismatched versions, as described in the rollout/rollback failure above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks, I've updated this

Describe manual testing that was done and the outcomes.
Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
-->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to leave it no for now, for beta you'll need to put here a description of upgrade->downgrade->upgrade manual testing.

@aojea
Copy link
Member

aojea commented Jun 11, 2025

/lgtm

@adrianmoisey adrianmoisey force-pushed the relaxed-name-validation-service branch from c3a8c1b to f031c01 Compare June 12, 2025 05:33
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2025
@thockin
Copy link
Member

thockin commented Jun 12, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 12, 2025
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
the PRR


1. With the feature gate enabled, test that Services can be created with both new and previous validation
1. With the feature gate disabled, test that Services can be created with the previous validation, and fail when using the new validation
1. Disable the feature gate and ensure that the Service can be edited without a validation error being returned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable - thx.

Services that depend on the new validation, since any version of Kubernetes with
this feature gate will allow existing Services to have `NameIsDNSLabel` names. If a
user rolls back to a version before the feature gate, and has Services that depend
on the new validation, they will be unable to modify those services, and will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adrianmoisey, soltysh, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 13, 2025
@k8s-ci-robot k8s-ci-robot merged commit d42c6b4 into kubernetes:master Jun 13, 2025
4 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jun 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/network Categorizes an issue or PR as relevant to SIG Network. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants