Skip to content

Conversation

@Jeffwan
Copy link
Collaborator

@Jeffwan Jeffwan commented May 9, 2025

Pull Request Description

  1. Update key from hpkv_nodes to hpkv_cluster_metadata based on internal sync up result
  2. Support sa/role/rolebinding for watcher pod. We used to specific a fixed service account name and require user to create it and use it. Now, service account is fully managed by controllers.
  3. make manifest to generate latest role. I notice customresource is to kubebuilder market in last PR [Misc] feature(rayclusterreplicaset): check rayclusters crd is installed before controller start #922, I added this back.
  4. remove fixed k8s.volcengine.com/pod-networks setting. Only if user specific volcano engine rdma devices, we will attach this annotation

Related Issues

Resolves: #[Insert issue number(s)]

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

@Jeffwan Jeffwan force-pushed the jiaxin/kv-left-over branch from d04c991 to 07a277a Compare May 9, 2025 07:09
@Jeffwan Jeffwan changed the title Support watcher pod rbac Support watcher pod rbac in kvcache controller May 9, 2025
@Jeffwan Jeffwan force-pushed the jiaxin/kv-left-over branch from 07a277a to b22eb51 Compare May 9, 2025 07:18
@Jeffwan Jeffwan requested review from Copilot and varungup90 May 9, 2025 18:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for managing RBAC for the watcher pod in the KVCache controller by introducing new reconciliation logic and updating the backend implementations for both HPKV and InfiniStore. Key changes include updating RBAC rules in the controller, adding new helper functions for creating service accounts, roles, and role bindings, and replacing a fixed service account name with one derived from the KVCache resource name.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/controller/kvcache/kvcache_controller.go Added RBAC rules for serviceaccounts, roles, and rolebindings.
pkg/controller/kvcache/backends/reconciler.go Introduced new functions to reconcile SA, Role, and RoleBinding.
pkg/controller/kvcache/backends/infinistore.go Added BuildWatcherPod* functions and updated service account naming for watcher pod.
pkg/controller/kvcache/backends/hpkv.go Added BuildWatcherPod* functions and updated service account naming for watcher pod.
pkg/controller/kvcache/backends/distributed_test.go Extended mockBackend with new fields to support RBAC resource testing.
pkg/controller/kvcache/backends/distributed.go Added new reconciliation calls for watcher pod RBAC objects.
pkg/controller/kvcache/backends/common.go Added common helper methods for building SA, Role, and RoleBinding.
config/rbac/controller-manager/role.yaml Updated RBAC rules to include permissions for the new watcher pod objects.
cmd/kvcache-watcher/main.go Updated constant for key from "hpkv_nodes" to "hpkv_cluster_metadata".
Comments suppressed due to low confidence (1)

pkg/controller/kvcache/backends/hpkv.go:172

  • [nitpick] Verify that using kvCache.Name as the ServiceAccountName consistently across backends avoids potential naming conflicts; adding a distinguishing prefix or suffix might improve clarity if multiple resources exist in the same namespace.
ServiceAccountName: kvCache.Name,

update hpkv key in redis

Signed-off-by: Jiaxin Shan <[email protected]>
@Jeffwan Jeffwan force-pushed the jiaxin/kv-left-over branch from b22eb51 to 3fdd2ef Compare May 9, 2025 19:05
@googs1025
Copy link
Collaborator

Thanks for the invitation, I will help review it today. 😄

return nil
}

func (r *BaseReconciler) reconcileWatcherPodServiceAccount(ctx context.Context, sa *corev1.ServiceAccount) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is reconcile, do we need to consider the update case? For example, do we need to check whether they are equal, and update if they are not equal? 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is supposed to be checked. for other major objects, we have an Equal check and update the object correspondingly. SA/Role are won't be frequently changed so I skipped the logic in the initial version. Let's add them in future PRs

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented May 10, 2025

I will merge it for rc purpose, if other reviewers have further feedback, please leave the comments and i will address them later

@Jeffwan Jeffwan merged commit 54bb1bc into vllm-project:main May 10, 2025
13 checks passed
@Jeffwan Jeffwan deleted the jiaxin/kv-left-over branch May 10, 2025 17:50
Yaegaki1Erika pushed a commit to Yaegaki1Erika/aibrix that referenced this pull request Jul 23, 2025
* Support watcher pod rbac in kv cache controller
* Update hpkv key in redis
* Grant pods/exec role to controller-manager
* Add prometheus scrape configuration in controller and watcher

---------

Signed-off-by: Jiaxin Shan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants