HabanaAI / vllm-hpu-extension Public

Notifications You must be signed in to change notification settings
Fork 36
Star 13

Code
Issues 1
Pull requests 28
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: HabanaAI/vllm-hpu-extension

Labels 10 Milestones 0

New pull request New

28 Open 270 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Cherry-pick window FusedSDPA for Gemma3 onto v1.22

#302 opened Jul 18, 2025 by MohitIntel

Loading…

Fix fallback buckets

#301 opened Jul 18, 2025 by madamczyk-intel

Loading…

Fix the fusedsdpa with sliding window alignment issue

#298 opened Jul 17, 2025 by libinta

Loading…

Fix: Pass correct long_context flag to warmup_range_with_limit

#297 opened Jul 17, 2025 by yafshar

Loading…

Draft: Proper chunked prefill bucketing

#295 opened Jul 16, 2025 by kzawora-intel • Draft

[SW-228042] Add support for dynamic vLLM kv-cache quantization

#292 opened Jul 16, 2025 by dudilester

Loading…

Add block_softmax_adjustment and block_softmax kernels

#289 opened Jul 16, 2025 by czhu15

Loading…

Enable calibration using pile-10k dataset for DeepSeek models

#279 opened Jul 14, 2025 by yangulei

Loading…

[V1] Defragmentation support

#275 opened Jul 10, 2025 by madamczyk-intel

Loading…

skip invalid decoding buckets with bs>blocks

#269 opened Jul 10, 2025 by yangulei

Loading…

Introduce block_softmax_adjustment kernel (#163)

#263 opened Jul 8, 2025 by kdamaszk • Draft

Fix for calibration error TypeError: generate_responses() missing 1 required positional argument: 'args'

#255 opened Jul 2, 2025 by tthakkal

Loading…

Enable block_softmax_adjustment on Gaudi2

#254 opened Jul 2, 2025 by kdamaszk • Draft

Add pre-commit static checks

#247 opened Jun 30, 2025 by kzawora-intel

Loading…

Allow usage of fused_block_softmax_adjustment for Qwen with Lazy

#246 opened Jun 27, 2025 by mswiniarsk • Draft

Update dependabot.yml

#242 opened Jun 26, 2025 by michalkuligowski

Loading…

Update linear.py

#239 opened Jun 25, 2025 by michalkuligowski

Loading…

Exponential bucketing tweaks

#224 opened Jun 13, 2025 by madamczyk-intel

Loading…

Use sets for faster filter checks. Better long context support

#203 opened May 28, 2025 by pi314ever

Loading…

Add useful internal vllm test

#200 opened May 27, 2025 by nirda7 • Draft

[SW-225565] Enable triangular softmax with merged prefill

#197 opened May 26, 2025 by kamil-kaczor • Draft

fix the issue that bmax not in bucket buffer

#191 opened May 22, 2025 by sywangyi

Loading…

Optimized MoE on Gaudi

#159 opened Apr 18, 2025 by gyou2021 • Draft

[FIX] fp8 gc compile error

#110 opened Mar 4, 2025 by maktukmak • Draft

Expand capability checks

#89 opened Feb 3, 2025 by kzawora-intel • Draft

Previous 1 2 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!