Skip to content

[Feature request] Dynamic splitfuse from Deepspeed (2x throughput) #317

@0xymoro

Description

@0xymoro

Hi, putting this here:
https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen

The latency & throughput increase is significant though the comparisons are against vLLM. It seems like TRT does batching a bit differently so unsure if this can equally apply here.

Metadata

Metadata

Labels

feature requestNew feature or request. This includes new model, dtype, functionality supporttriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions