Skip to content

Add Private Link support#6274

Open
robinkb wants to merge 1 commit into
kubernetes-sigs:mainfrom
giantswarm:add-privatelink-support
Open

Add Private Link support#6274
robinkb wants to merge 1 commit into
kubernetes-sigs:mainfrom
giantswarm:add-privatelink-support

Conversation

@robinkb
Copy link
Copy Markdown

@robinkb robinkb commented May 5, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

From: #3747 (comment)

The feature is intended for clusters with private networking, it adds option to deploy private link for internal API server load balancer, which enables users to access private clusters from anywhere in Azure simply by creating private endpoints to it (CAPZ got support for private endpoints #3044). More details in the issue linked below.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #3400

Special notes for your reviewer:

This PR was originally submitted in #3747, but unfortunately it got stalled due to time constraints. I have gone through the comments in the original PR, and applied what still seemed relevant to the revision in this PR. Special thanks to @nprokopic for development of the feature.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

Add Private Links support

Co-authored-by: Nikola Prokopić <nikola@prokopic.rs>
Signed-off-by: Robin Ketelbuters <robin.k@giantswarm.io>
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. labels May 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fabriziopandini for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 5, 2026
@k8s-ci-robot k8s-ci-robot requested a review from jackfrancis May 5, 2026 12:42
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @robinkb!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-azure 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-azure has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot requested a review from jsturtevant May 5, 2026 12:42
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @robinkb. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@nprokopic
Copy link
Copy Markdown
Contributor

/ok-to-test

Thanks for reviving this :)

FYI stale bot closed the issue #3400, probably a good idea to check with CAPZ folks if the issue should be reopened.

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 6, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 34.74114% with 479 lines in your changes missing coverage. Please review.
✅ Project coverage is 43.64%. Comparing base (bd1e799) to head (8d93ffc).
⚠️ Report is 26 commits behind head on main.

Files with missing lines Patch % Lines
...rivatelinks/mock_privatelinks/privatelinks_mock.go 0.00% 370 Missing ⚠️
azure/services/privatelinks/client.go 0.00% 44 Missing ⚠️
azure/services/privatelinks/spec.go 80.55% 14 Missing and 7 partials ⚠️
azure/services/privatelinks/privatelinks.go 67.44% 13 Missing and 1 partial ⚠️
util/futures/getter.go 0.00% 9 Missing ⚠️
azure/scope/cluster.go 75.75% 6 Missing and 2 partials ⚠️
...zure/services/privateendpoints/privateendpoints.go 0.00% 5 Missing ⚠️
azure/scope/managedcontrolplane.go 0.00% 2 Missing ⚠️
azure/services/subnets/spec.go 0.00% 1 Missing and 1 partial ⚠️
controllers/azurecluster_reconciler.go 50.00% 1 Missing and 1 partial ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6274      +/-   ##
==========================================
- Coverage   43.88%   43.64%   -0.24%     
==========================================
  Files         289      293       +4     
  Lines       25351    26077     +726     
==========================================
+ Hits        11125    11382     +257     
- Misses      13448    13904     +456     
- Partials      778      791      +13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@willie-yao
Copy link
Copy Markdown
Contributor

/assign

@willie-yao
Copy link
Copy Markdown
Contributor

Hi @robinkb, thanks for your hard work on this PR! Apologies if this wasn't clear in our docs, but we would prefer any new services added to use ASO instead of Azure SDK. We're trying to migrate away from Azure SDK and still have a goal of deprecating it and going ASO-only. Would you be able to investigate refactoring this PR to use ASO? I know it's a big ask, so if not we can pick it up as a follow-up.

@robinkb
Copy link
Copy Markdown
Author

robinkb commented Jun 2, 2026

The vast, vast majority of the work was done by Nikola, so all credit to him! ❤️ I just updated the patch.

Can you point me towards some example implementations based on ASO? That will help us make an estimate. Our time is a bit tight right now though, so I doubt that we will have the bandwidth to refactor the PR shortly. In any case, it will be helpful to have a reference for future contributions that we plan to make.

@willie-yao
Copy link
Copy Markdown
Contributor

No worries! I want to actually correct what I said earlier. I think it's fine to merge this as an Azure SDK service since we haven't been prioritizing migrating everything to ASO anyways. I'll give this a proper review and get back to you, thanks for your patience!

Copy link
Copy Markdown
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your great work! I left a few comments and will do another pass once they're addressed.

Comment on lines +206 to +218
// Check allowed subscriptions
if !compareStringPointerSlicesUnordered(
wanted.Properties.Visibility.Subscriptions,
existing.Properties.Visibility.Subscriptions) {
return false
}

// Check auto-approved subscriptions
if !compareStringPointerSlicesUnordered(
wanted.Properties.AutoApproval.Subscriptions,
existing.Properties.AutoApproval.Subscriptions) {
return false
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that constructParameters only populates Properties.Visibility when len(s.AllowedSubscriptions) > 0 and Properties.AutoApproval when len(s.AutoApprovedSubscriptions) > 0. Both API fields are documented as optional. The comparison here dereferences both unconditionally, so any spec that omits the subscription lists panics on the second reconcile. Can you add a nil check for both fields?

Comment thread azure/scope/cluster.go
// First we get all private links to API server load balancer.
// Other load balancers (ControlPlaneOutboundLB and NodeOutboundLB) are outbound, so we cannot create private links
// for those.
privateLinks := s.AzureCluster.Spec.NetworkSpec.APIServerLB.PrivateLinks
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll need to add a nil check for Spec.NetworkSpec.APIServerLB here as well. For externally managed control planes, Spec.NetworkSpec.APIServerLB will be set to nil as seen here" https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/internal/api/v1beta1/azurecluster_default.go#L74

You can do something like what you had on line 457:

if apiServerLB := s.AzureCluster.Spec.NetworkSpec.APIServerLB; apiServerLB != nil { ... }

Comment thread api/v1beta1/types.go
FrontendIPsCount *int32 `json:"frontendIPsCount,omitempty"`
// PrivateLinks to the load balancer (max 8 private links).
// +optional
PrivateLinks []PrivateLink `json:"privateLinks,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since PrivateLinks is only for internal LBs, I think it would help to have the webhook reject this field being set for node/control plane outbound lbs. You can maybe add something like this to validateNodeOutboundLB and validateControlPlaneOutboundLB:

if len(lb.PrivateLinks) > 0 {
     allErrs = append(allErrs, field.Forbidden(
         fldPath.Child("privateLinks"),
         "privateLinks are only supported on the API server load balancer",
     ))
 }

Comment on lines +857 to +860
var subnetCIDRs []string
for _, subnet := range subnets {
subnetCIDRs = append(subnetCIDRs, subnet.CIDRBlocks...)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the CIDRs from subnet.CIDRBlocks would get them from every cluster subnet, so a NAT IP config that targets subnet A with a static IP from subnet B's CIDR is accepted. I think we should instead look up the subnet referenced by natIPConfig.Subnet first and validate the IP only against that subnet's CIDRs.

Comment thread api/v1beta1/types.go
// PrivateLinkNATIPConfiguration specifies NAT IP configuration for the private link.
type PrivateLinkNATIPConfiguration struct {
// AllocationMethod specifies how the private link NAT IPs are allocated: "Static" or "Dynamic".
AllocationMethod string `json:"allocationMethod"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this use the PrivateLinkNATIPAllocationMethod type defined below instead of a string?

pl.NATIPConfigurations[j].Subnet,
fmt.Sprintf("NATIPConfiguration must use existing subnet (subnet %s not specified in AzureCluster resource)", natIPConfig.Subnet)))
}
if natIPConfig.AllocationMethod == "Static" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if natIPConfig.AllocationMethod == "Static" {
if natIPConfig.AllocationMethod == infrav1.NATIPAllocationMethodStatic {

// NATIPConfiguration defines the NAT IP configuration for the private link service.
type NATIPConfiguration struct {
// AllocationMethod can be Static or Dynamic.
AllocationMethod string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AllocationMethod string
AllocationMethod infrav1.PrivateLinkNATIPAllocationMethod

Related to above comment changing AllocationMethod from type string to PrivateLinkNATIPAllocationMethod.

for i, natIPConfiguration := range s.NATIPConfiguration {
ipAllocationMethod := armnetwork.IPAllocationMethod(natIPConfiguration.AllocationMethod)
if ipAllocationMethod != armnetwork.IPAllocationMethodDynamic && ipAllocationMethod != armnetwork.IPAllocationMethodStatic {
return armnetwork.PrivateLinkService{}, errors.Errorf("%T is not a supported armnetwork.IPAllocationMethodStatic", natIPConfiguration.AllocationMethod)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think %T would just print out the Go type name and not the value the user passed. Also it could be dynamic in this case.

Suggested change
return armnetwork.PrivateLinkService{}, errors.Errorf("%T is not a supported armnetwork.IPAllocationMethodStatic", natIPConfiguration.AllocationMethod)
return armnetwork.PrivateLinkService{}, errors.Errorf("%q is not a supported NAT IP allocation method (must be %q or %q)", natIPConfiguration.AllocationMethod, infrav1.NATIPAllocationMethodStatic, infrav1.NATIPAllocationMethodDynamic)

Comment on lines +788 to +791
for i, pl := range lb.PrivateLinks {
if err := validatePrivateLinkName(pl.Name, fldPath.Child("privateLinks").Index(i).Child("name")); err != nil {
allErrs = append(allErrs, err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to also add a uniqueness check for privateLinks[i].name across lb.PrivateLinks. Right now the webhook accepts two privateLinks entries with identical names.

@github-project-automation github-project-automation Bot moved this from Todo to Wait-On-Author in CAPZ Planning Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

Status: Wait-On-Author

Development

Successfully merging this pull request may close these issues.

Private link for API server internal load balancer

4 participants