Development Roadmap (2025 H1)

Here is the development roadmap for 2025 Q1 and Q2. Contributions and feedback are welcome ([**Join Bi-weekly Development Meeting**](https://docs.google.com/document/d/1xEow4eIM152xNcRxqZz9VEcOiTQo8-CEuuQ5qTmkt-E/edit?tab=t.0#heading=h.ito5nvp7oasg)). Previous 2024 Q4 roadmap can be found in #1487.

### DeepSeek R1 optimization
@zhyncs @ispobock 
TBD

## Performance
- [ ] Support speculative decoding
  - Eagle Optimization #3822 
  - Reference-based. #3269 
  - Align with the speed of grok
- [ ] P/D Disaggregation
  - Bump internal codes
  - Mooncake Integration

## Parallelism
- [ ] Support sequence parallelism #1436. Related [paper](https://www.arxiv.org/pdf/2411.01783)
- [ ] Support pipeline parallelism.
- [ ] Optimize expert parallelism + data parallelism for DeepSeekmodels.
- [ ] Optimize expert parallelism for Qwen Models.
- [ ] Overlap communication in tensor parallelsim. @zhuohaol @fzyzcjy 

## Hardware Optimizations
- [ ] AMD optimizations. @HaiShaw @yiakwy-xpu-ml-framework-team 
- [ ] Intel XPU optimization. @shanyu-sys 

## Model Coverage
- [ ] Multi-modal models
  - merge all the PRs from @mickqian @yizhang2077 
  - support streaming models @yiranyyu 
  - VLM optimization @Lyken17
- [ ] Embed models
  -  encoder models @yichuan520030910320 

## New features
- [ ] Performance optimizations for multi-LoRA serving @Fridge003 
- [ ] RLHF support with veRL team @zhaochenyang20 
- [ ] GRPO of trl @jhinpan 
- [ ] Optimize funciton calling and constraint decoding @minleminzui 

## Quantization 
- [ ] unsloth model support @guapisolo @XueyingJia

## Server API
- [ ] Support directly taking embedding as inputs. #745
- [x] Add APIs for using the inference engine in a single script without launching a separate server. See also [examples](https://docs.vllm.ai/en/latest/getting_started/examples/offline_inference.html).
  - #1567
- [ ] Support endpoint other than OpenAI (Anthropic, Mistral) in the language frontend.



## Observability
- [ ] Open-to-use Grafana / Prometheus @PopSoda2002 @ziliangpeng

## Others
- [ ] VLM refactor @mickqian 
- [ ] VLM RLHF @yiranyyu @shuaills 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Development Roadmap (2025 H1) #4035

DeepSeek R1 optimization

Performance

Parallelism

Hardware Optimizations

Model Coverage

New features

Quantization

Server API

Observability

Others

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Development Roadmap (2025 H1) #4035

Description

DeepSeek R1 optimization

Performance

Parallelism

Hardware Optimizations

Model Coverage

New features

Quantization

Server API

Observability

Others

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions