Skip to content

Janitor Cleanup Based on VPC ID #226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2025

Conversation

sudharshanibm
Copy link
Contributor

Enhances the IBM Cloud Janitor functionality to support resource cleanup based on a specific VPC ID.

Changes Introduced:

  • Added conditional logic to clean up resources (subnets, floating IPs, load balancers) based on the VPC they are associated with.
  • When vpc-id is passed in the API call, the janitor filters and deletes:
    • Subnets within the specified VPC
    • Unbound floating IPs (with conditional checks if VPC is specified)
    • Load balancers attached to subnets belonging to the specified VPC
  • If vpc-id is not provided, the cleanup falls back to using the resource group as the default filter.

Why:
This allows more granular and targeted cleanup, especially useful when multiple VPCs exist under the same resource group.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels May 5, 2025
@k8s-ci-robot k8s-ci-robot requested review from Amulyam24 and mkumatag May 5, 2025 19:44
@k8s-ci-robot
Copy link
Contributor

Hi @sudharshanibm. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 5, 2025
@Amulyam24
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 6, 2025
@Amulyam24
Copy link
Contributor

/cc @Prajyot-Parab

@k8s-ci-robot k8s-ci-robot requested a review from Prajyot-Parab May 6, 2025 07:58
continue
}

subnet, _, err := client.GetSubnet(&vpcv1.GetSubnetOptions{ID: fullLB.Subnets[0].ID})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen incase of multiple subnets?

Copy link
Contributor Author

@sudharshanibm sudharshanibm May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Prajyot-Parab ,
For pointing out.
I have updated the code now such that it iterates through all subnets attached to each load balancer and:

  • Checks if at least one subnet belongs to the target VPC (client.VPCID)
  • If any one subnet is in the correct VPC, the load balancer is marked for deletion
  • If none of the subnets belong to the target VPC, the load balancer is skipped

Built an ibmcloud-janitor-boskos image with updated code and validated the changes

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 6, 2025
return c.vpcService.GetLoadBalancerListener(options)
}

func (c *IBMVPCClient) GetLoadBalancerPool(options *vpcv1.GetLoadBalancerPoolOptions) (*vpcv1.LoadBalancerPool, *core.DetailedResponse, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this function used anywhere?

return true
}

if len(fullLB.Subnets) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't returning true skip the LB for deletion when subnets are 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes returning true here intentionally skips deletion of the load balancer when len(fullLB.Subnets) == 0 because without subnets, we cannot verify if the load balancer belongs to the specified VPC
this is a safe check to avoid accidentally deleting unrelated resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case, do you think its better to delete the LBs in that resource group as there is chance that it could be stale?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes load balancers with no subnets are likely to be stale. Initially, I skipped deleting them to avoid any risk of removing unrelated resources, especially since we couldn't verify their VPC association without subnet info

now I've updated the logic to safely allow deletion of such LBs if they belong to the specified resource group.

continue
}
if subnet.VPC != nil && *subnet.VPC.ID == client.VPCID {
return false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when subnets for the LB exist in the desired VPC, will it delete LB directly if it has subnets attached to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it will attempt to delete the LB even if it has subnets attached, as long as those subnets belong to the desired VPC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question here, shouldn't this be true? If subnet is attached to the VPC passed in user-data, we want to skip deletion. Returning false would proceed to LB deletion right?

return lb.ResourceGroup == nil || *lb.ResourceGroup.ID != client.ResourceGroupID
}

fullLB, _, err := client.GetLoadBalancer(&vpcv1.GetLoadBalancerOptions{ID: lb.ID})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fullLB, _, err := client.GetLoadBalancer(&vpcv1.GetLoadBalancerOptions{ID: lb.ID})
lbDetails, _, err := client.GetLoadBalancer(&vpcv1.GetLoadBalancerOptions{ID: lb.ID})

@sudharshanibm sudharshanibm force-pushed the janitor branch 2 times, most recently from 5860e25 to d5bebeb Compare May 6, 2025 15:46
Copy link
Contributor

@Amulyam24 Amulyam24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits, can be enhanced in future if user-data would be removed.


if len(lbDetails.Subnets) == 0 {
// No subnets: assume stale, delete if in the correct resource group
if lb.ResourceGroup != nil && *lb.ResourceGroup.ID == client.ResourceGroupID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this condition would already be satisfied in L107 right? I think we can directly go ahead with deletion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes actually the check for the resource group is already being done earlier, so once we detect that len(lbDetails.Subnets) == 0 we can directly proceed with deletion.
However, we're using lbDetails here, which contains more complete information than what we get from the list call.
I’ve updated the code to eliminate the redundant check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and reason for adding GetLoadbalancer call? will the ListLoadbalancers output not have the details of subnets in the LB output?

Copy link
Contributor Author

@sudharshanibm sudharshanibm May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the GetLoadBalancer call is required because the output from ListLoadBalancers does not include detailed information about attached subnets
It only returns high-level metadata for each LB and in order to access the subnets field with actual values, we need to call GetLoadBalancer which contains the full details of the resource

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please cross check? I see that LB list returned in ListLoadbalancers and LB inGetLoadbalancer are pointing to the same type - https://pkg.go.dev/github.com/IBM/[email protected]/vpcv1#LoadBalancer which has subnet details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve manually verified the behavior with the updated code and checked the outputs, found that the ListLoadBalancers response already includes the subnet details.
so I’ve modified the code to remove the redundant check and directly proceed with deletion when no subnets are attached.

=== Output from GetLoadBalancer ===
Name: k8s-test-lb
  Subnet #1:
    ID: xxxx-xxxx-xxxx-xxxx-xxxx
    Name: k8s-test-subnet

=== Output from ListLoadBalancers ===
Name: k8s-test-lb
  Subnet #1:
    ID: xxxx-xxxx-xxxx-xxxx-xxxx
    Name: k8s-test-subnet

here both outputs return the subnet details. so, the GetLoadBalancer call to fetch detailed subnet information is not required as you mentioned and we can proceed with deletion if len(lbDetails.Subnets) == 0 without additional checks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for verifying.

@@ -34,6 +34,12 @@ func (VPCs) cleanup(options *CleanupOptions) error {
return errors.Wrap(err, "couldn't create VPC client")
}

// Skip VPC deletion if specific VPC ID is provided
if client.VPCID != "" {
resourceLogger.Info("Skipping VPC deletion as specific VPC ID was provided")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resourceLogger.Info("Skipping VPC deletion as specific VPC ID was provided")
resourceLogger.Info("Skipping VPC deletion as VPC ID is passed in user-data")

Better to log VPC name as well.

@@ -99,6 +101,46 @@ func (VPCLoadBalancer) cleanup(options *CleanupOptions) error {
return nil
}

func shouldSkipLoadBalancer(lb vpcv1.LoadBalancer, client *IBMVPCClient, errs *[]error) bool {
if client.VPCID == "" {
// No VPCID provided, just check resource group
Copy link
Contributor

@Amulyam24 Amulyam24 May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// No VPCID provided, just check resource group
// No VPC ID is provided, check if LB's resource group matches the ID passed in user-data

@@ -78,6 +84,10 @@ func (VPCNetwork) cleanup(options *CleanupOptions) error {
return errors.Wrap(err, "failed to list the floating IPs")
}
for _, fip := range fips.FloatingIps {
if client.VPCID != "" && fip.Target != nil {
// Skip bound FIPs if VPCID is specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Skip bound FIPs if VPCID is specified
// Skip bound FIPs if VPC ID is specified

// List subnets with optional VPC filter
listSubnetOpts := &vpcv1.ListSubnetsOptions{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// List subnets with optional VPC filter
listSubnetOpts := &vpcv1.ListSubnetsOptions{
listSubnetOpts := &vpcv1.ListSubnetsOptions{

ResourceGroupID: &client.ResourceGroupID,
})
}
if client.VPCID != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if client.VPCID != "" {
// List subnets with optional VPC filter
if client.VPCID != "" {

}

if len(lbDetails.Subnets) == 0 {
// No subnets: assume stale, delete if in the correct resource group
Copy link
Contributor

@Amulyam24 Amulyam24 May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// No subnets: assume stale, delete if in the correct resource group
// LB does not have attached subnets, can procced with deletion
can

if len(lbDetails.Subnets) == 0 {
// No subnets: assume stale, delete if in the correct resource group
if lb.ResourceGroup != nil && *lb.ResourceGroup.ID == client.ResourceGroupID {
resourceLogger.WithField("name", *lb.Name).Warn("load balancer has no subnets, assuming stale and eligible for deletion")
Copy link
Contributor

@Amulyam24 Amulyam24 May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
resourceLogger.WithField("name", *lb.Name).Warn("load balancer has no subnets, assuming stale and eligible for deletion")
resourceLogger.WithField("name", *lb.Name).Warn("load balancer has no attached subnets, assuming stale and proceed with deletion")

@@ -89,6 +90,13 @@ func (c *IBMVPCClient) ListLoadBalancers(options *vpcv1.ListLoadBalancersOptions
func (c *IBMVPCClient) GetLoadBalancer(options *vpcv1.GetLoadBalancerOptions) (result *vpcv1.LoadBalancer, response *core.DetailedResponse, err error) {
return c.vpcService.GetLoadBalancer(options)
}
func (c *IBMVPCClient) GetLoadBalancerListener(options *vpcv1.GetLoadBalancerListenerOptions) (*vpcv1.LoadBalancerListener, *core.DetailedResponse, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed removing the unused function?

Comment on lines 107 to 114
return lb.ResourceGroup == nil || *lb.ResourceGroup.ID != client.ResourceGroupID
}

lbDetails, _, err := client.GetLoadBalancer(&vpcv1.GetLoadBalancerOptions{ID: lb.ID})
if err != nil {
resourceLogger.WithField("name", *lb.Name).Error("failed to get load balancer details")
*errs = append(*errs, err)
return true
}

if len(lbDetails.Subnets) == 0 {
// LB does not have attached subnets, can procced with deletion
if lbDetails.ResourceGroup != nil && *lbDetails.ResourceGroup.ID == client.ResourceGroupID {
resourceLogger.WithField("name", *lb.Name).Warn("load balancer has no attached subnets, assuming stale and proceed with deletion")
return false
}
return true
}
Copy link
Contributor

@Amulyam24 Amulyam24 May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return lb.ResourceGroup == nil || *lb.ResourceGroup.ID != client.ResourceGroupID
}
lbDetails, _, err := client.GetLoadBalancer(&vpcv1.GetLoadBalancerOptions{ID: lb.ID})
if err != nil {
resourceLogger.WithField("name", *lb.Name).Error("failed to get load balancer details")
*errs = append(*errs, err)
return true
}
if len(lbDetails.Subnets) == 0 {
// LB does not have attached subnets, can procced with deletion
if lbDetails.ResourceGroup != nil && *lbDetails.ResourceGroup.ID == client.ResourceGroupID {
resourceLogger.WithField("name", *lb.Name).Warn("load balancer has no attached subnets, assuming stale and proceed with deletion")
return false
}
return true
}
return lb.ResourceGroup == nil || *lb.ResourceGroup.ID != client.ResourceGroupID
}
if len(lb.Subnets) == 0 {
// LB does not have attached subnets, can procced with deletion
resourceLogger.WithField("name", *lb.Name).Warn("load balancer has no attached subnets, assuming stale and proceed with deletion")
return false
}

will this work?

Copy link
Contributor

@Amulyam24 Amulyam24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, LGTM!

return false
}

// Check subnets for validity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Check subnets for validity
// Check if subnets belong to the VPC using ID passed in user-data. If yes, skip deletion of LB

Comment on lines 110 to 127
if len(lb.Subnets) == 0 {
// LB does not have attached subnets, can proceed with deletion
resourceLogger.WithField("name", *lb.Name).Warn("load balancer has no attached subnets, assuming stale and proceed with deletion")
return false
}

// Check if subnets belong to the VPC using ID passed in user-data. If yes, skip deletion of LB
for _, subnetRef := range lb.Subnets {
subnet, _, err := client.GetSubnet(&vpcv1.GetSubnetOptions{ID: subnetRef.ID})
if err != nil {
resourceLogger.WithFields(logrus.Fields{
"name": *lb.Name,
"subnet_id": subnetRef.ID,
}).Error("failed to get subnet details")
*errs = append(*errs, err)
continue
}
if subnet.VPC != nil && *subnet.VPC.ID == client.VPCID {
return false
}
}

return true
}
Copy link
Contributor

@Amulyam24 Amulyam24 May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(lb.Subnets) == 0 {
// LB does not have attached subnets, can proceed with deletion
resourceLogger.WithField("name", *lb.Name).Warn("load balancer has no attached subnets, assuming stale and proceed with deletion")
return false
}
// Check if subnets belong to the VPC using ID passed in user-data. If yes, skip deletion of LB
for _, subnetRef := range lb.Subnets {
subnet, _, err := client.GetSubnet(&vpcv1.GetSubnetOptions{ID: subnetRef.ID})
if err != nil {
resourceLogger.WithFields(logrus.Fields{
"name": *lb.Name,
"subnet_id": subnetRef.ID,
}).Error("failed to get subnet details")
*errs = append(*errs, err)
continue
}
if subnet.VPC != nil && *subnet.VPC.ID == client.VPCID {
return false
}
}
return true
}
// Check if subnets belong to the VPC using ID passed in user-data. If yes, skip deletion of LB
for _, subnetRef := range lb.Subnets {
subnet, _, err := client.GetSubnet(&vpcv1.GetSubnetOptions{ID: subnetRef.ID})
if err != nil {
resourceLogger.WithFields(logrus.Fields{
"name": *lb.Name,
"subnet_id": subnetRef.ID,
}).Error("failed to get subnet details")
*errs = append(*errs, err)
continue
}
if subnet.VPC != nil && *subnet.VPC.ID == client.VPCID {
return true
}
}
resourceLogger.WithField("name", *lb.Name).Warn("load balancer has no attached subnets, assuming stale and proceeding with deletion")
return false
}

Can this be simlified as above?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sudharshanibm sudharshanibm force-pushed the janitor branch 2 times, most recently from 440faa1 to 6c86d5f Compare May 8, 2025 18:22
Comment on lines 83 to 86
zone, ok := r.UserData.Map.Load(Zone)
if !ok {
return nil, errors.New("no zone in UserData")
}
Copy link
Contributor

@Amulyam24 Amulyam24 May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a breaking change for VPC resource types which are using janitor and don't have zone in user-data. I see you are only loading zone but not using it anywhere, Is this change needed?

Wouldn't adding it in user-data alone be sufficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding it to the user-data itself is sufficient.
Can we make it optional, so that it doesn’t break existing resources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VPC zone is not used anywhere in the janitor, any reason for making it optional?

Copy link
Contributor

@Amulyam24 Amulyam24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 9, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Amulyam24, sudharshanibm

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 9, 2025
@k8s-ci-robot k8s-ci-robot merged commit c52f0a1 into kubernetes-sigs:master May 9, 2025
3 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone May 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants