vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 2.9k
Star 20.8k

Code
Issues 873
Pull requests 274
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 42 Milestones 0

New pull request New

274 Open 1,922 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

cache image build

#5419 opened Jun 11, 2024 by khluu

Loading…

[WIP] [Speculative Decoding] Support draft model on different tensor-parallel size than target model

#5414 opened Jun 11, 2024 by wooyeonlee0 • Draft

1 of 4 tasks

[Bugfix] We have fixed the bug that occurred when using FlashInfer as the backend in vLLM Speculative Decoding.

#5412 opened Jun 11, 2024 by bong-furiosa

Loading…

[Bugfix]Fix evict v2 with long context length

#5411 opened Jun 11, 2024 by puf147

Loading…

[WIP][Core] Refactor Worker and ModelRunner to consolidate control plane communication

#5408 opened Jun 11, 2024 by stephanie-wang • Draft

3 tasks

[Bugfix] Add device assertion to TorchSDPA x86 CPU

#5402 opened Jun 11, 2024 by bigPYJ1151

Loading…

[Kernel] Suppress mma.sp warning on CUDA 12.5 and later

#5401 opened Jun 11, 2024 by tlrmchlsmth

Loading…

[Speculative decoding] Initial spec decode docs

#5400 opened Jun 11, 2024 by cadedaniel

Loading…

[WIP][Core][Distributed] add shm broadcast

#5399 opened Jun 11, 2024 by youkaichao

Loading…

[Bugfix] fix lora_dtype value type in arg_utils.py

#5398 opened Jun 11, 2024 by c3-ali

Loading…

[Kernel] Vectorized FP8 quantize kernel

#5396 opened Jun 10, 2024 by comaniac

Loading…

[Experimental] Testing validity of the baseline AMD CI

#5394 opened Jun 10, 2024 by Alexei-V-Ivanov-AMD

Loading…

Set AMD tests soft_fail=false

#5393 opened Jun 10, 2024 by simon-mo

Loading…

[Kernel] Factor out epilogues from cutlass kernels

#5391 opened Jun 10, 2024 by tlrmchlsmth

Loading…

[Kernel] Adding fused bias add to cutlass_scaled_mm_dq kernel

#5390 opened Jun 10, 2024 by cyang49

Loading…

[Model][Hardware][NV] Add support for ModelOpt static scaling checkpoints

#5387 opened Jun 10, 2024 by pavanimajety • Draft

[Kernel] w4a16 support for compressed-tensors

#5385 opened Jun 10, 2024 by dsikka • Draft

[CI] Upgrade codespell version.

#5381 opened Jun 10, 2024 by rkooo567

Loading…

[Hardware][Intel] OpenVINO vLLM backend

#5379 opened Jun 10, 2024 by ilya-lavrenov

Loading…

[Core][Distributed] add same-node detection

#5369 opened Jun 10, 2024 by youkaichao

Loading…

[WIP][Core] Support tensor parallel division with remainder of attention heads

#5367 opened Jun 9, 2024 by NadavShmayo

Loading…

[Core][Bugfix]: fix prefix caching for blockv2

#5364 opened Jun 9, 2024 by leiwen83

Loading…

[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy

#5362 opened Jun 9, 2024 by KuntaiDu • Draft

2 of 6 tasks

[Model][Bugfix] Add GLM-4v support

#5358 opened Jun 8, 2024 by songxxzp

Loading…

[Kernel][RFC] Initial commit containing new Triton kernels for multi lora serving.

#5356 opened Jun 8, 2024 by FurtherAI

Loading…

1 task

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly