Skip to content

Conversation

@Jeffwan
Copy link
Collaborator

@Jeffwan Jeffwan commented May 6, 2025

Pull Request Description

This is a refactor PR of v1alpha1 KVCache Spec. the version we delivered in v0.2.0 is kind of simple and can not easily support distributed case.

  • Update kv controller codes based on latest types
  • Leverage types input values to create resources
  • Update kvcache examples

Related Issues

Resolves: #948

Important: Before submitting, please complete the description above and review the checklist below.


Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

  • [Bug]: Corrections to existing functionality
  • [CI]: Changes to build process or CI pipeline
  • [Docs]: Updates or additions to documentation
  • [API]: Modifications to aibrix's API or interface
  • [CLI]: Changes or additions to the Command Line Interface
  • [Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

  • PR title includes appropriate prefix(es)
  • Changes are clearly explained in the PR description
  • New and existing tests pass successfully
  • Code adheres to project style and best practices
  • Documentation updated to reflect changes (if applicable)
  • Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the v1alpha1 KVCache API and its controllers to support distributed use cases and improve resource management. Key changes include replacing legacy fields (e.g. CPU/memory strings) with structured ResourceRequirements, introducing a "mode" field to distinguish centralized versus distributed deployments, and updating API types and deep copy functions accordingly.

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

File Description
samples/kvcache/kvcache.yaml Added mode and updated resource/port specifications
pkg/controller/kvcache/* Refactored controller tests and backend implementations
api/orchestration/v1alpha1/* Updated API types, CRD definitions, and deep copy functions
config/crd/orchestration/orchestration.aibrix.ai_kvcaches.yaml Changed schema from legacy cacheSpec to new runtime/Cache fields

@Jeffwan Jeffwan force-pushed the jiaxin/kvcache-api-update branch from 86db43d to a5a24a0 Compare May 7, 2025 16:41
@Jeffwan Jeffwan marked this pull request as draft May 7, 2025 17:20
@Jeffwan Jeffwan changed the title Update kvcache v1alpha1 api spec [DO NOT MERGE] Update kvcache v1alpha1 api spec May 7, 2025
@Jeffwan Jeffwan marked this pull request as ready for review May 8, 2025 01:03
@Jeffwan Jeffwan changed the title [DO NOT MERGE] Update kvcache v1alpha1 api spec Update kvcache v1alpha1 api spec May 8, 2025
@Jeffwan Jeffwan force-pushed the jiaxin/kvcache-api-update branch 2 times, most recently from 0bdb026 to 013c772 Compare May 8, 2025 22:01
* Update kv controller codes based on latest types
* Leverage types input values to create resources
* Update kvcache examples

Signed-off-by: Jiaxin Shan <[email protected]>
@Jeffwan Jeffwan force-pushed the jiaxin/kvcache-api-update branch from 7d64146 to 3cfa0cb Compare May 8, 2025 23:22
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented May 9, 2025

It has been fully tested in my cluster and I will merge this PR first for some integration purpose. reviewers, please keep working on the review and I will address the feedback in future PRs

@Jeffwan Jeffwan merged commit 5595870 into vllm-project:main May 9, 2025
13 checks passed
@Jeffwan Jeffwan deleted the jiaxin/kvcache-api-update branch May 9, 2025 00:14
//nolint: lll
// +kubebuilder:default:={image: "aibrix/kvcache:20241120", imagePullPolicy: "IfNotPresent"}
Cache CacheSpec `json:"cacheSpec,omitempty"`
Cache RuntimeSpec `json:"cache,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RuntimeSpec's fields are a subset of the podSpec+replicas. I want to know the standard for exposing which fields from the podSpec. I think using podSpec directly would be more appropriate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, using podSpec is more flexible. We saw a few users meet problem using the right arguments so this api wrap lots of logic and only expose most common changes.. We plan to rollout v0.3.0 pretty soon, I think we can get some feedback from users and gradually improve this part. Feel free to leave more feedbacks. really appreicate it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for the response, looking forward to the 0.3.0 release.

"--kvcache-server-rdma-port", strconv.Itoa(params.RdmaPort),
"--kvcache-server-admin-port", strconv.Itoa(params.AdminPort),
"--consistent-hashing-total-slots", strconv.Itoa(params.TotalSlots),
"--consistent-hashing-virtual-node-count", strconv.Itoa(params.VirtualNodeCount),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal opinion, directly coupling the command in the code is not a good idea. I think that the pod created by kvcache belongs to the data plane, and it should be up to the user to decide. AiBrix could provide a best practice YAML that users can directly use, or they can opt to use their own custom-built image.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vie-serendipity thanks for the feedback. I agree with you, currently the user case is pretty straightforward so I didn't expose custom configuration like podSpec to external. This is kind of a simplified solution with higher abstraction, only exposing limited information to users, like image and resources.

This is subject to be changed. Feel free to propose new structure. before beta, we can use multiple alpha to refine this one.

Yaegaki1Erika pushed a commit to Yaegaki1Erika/aibrix that referenced this pull request Jul 23, 2025
* Update kv controller codes based on latest types
* Leverage types input values to create resources
* Update kvcache examples

Signed-off-by: Jiaxin Shan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC]: Support InfinityStore in AIBrix as the new KVCache backend

2 participants