-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Prefix Caching with Multi-Lora Support
bug
Something isn't working
#5475
opened Jun 12, 2024 by
curiositywan
[Bug][v0.5.0]: Benign error reported by Python multiprocessing resource_tracker
bug
Something isn't working
#5468
opened Jun 12, 2024 by
mgoin
[Feature]: Allow user defined extra request args to be logged in OpenAI compatible server
feature request
#5467
opened Jun 12, 2024 by
davidgxue
[Bug]: Runtime Error: GET was unable to find an engine to execute this computation for LLaVa-NEXT
bug
Something isn't working
#5465
opened Jun 12, 2024 by
XkunW
[Bug]: Error when --tensor-parallel-size > 1
bug
Something isn't working
#5458
opened Jun 12, 2024 by
javi111717
[Bug]: vllm v0.5.0 internal assert failed
bug
Something isn't working
#5450
opened Jun 12, 2024 by
changshivek
[Usage]: How to serve embedding model and LLM at the same time
usage
How to use vllm
#5449
opened Jun 12, 2024 by
weiyunfei
multilora_inference调用qwen2-1.5b报错
documentation
Improvements or additions to documentation
#5445
opened Jun 12, 2024 by
zigangzhao-ai
[Bug]: v0.4.3 AsyncEngineDeadError
bug
Something isn't working
#5443
opened Jun 12, 2024 by
changshivek
[Bug]: TypeError: a bytes-like object is required, not 'str'
bug
Something isn't working
#5440
opened Jun 12, 2024 by
yaoyasong
[Bug]: get the degree of the Something isn't working
outlines FSM
compilation progress from vlllm0.5.0 engine (via a route)
bug
#5436
opened Jun 12, 2024 by
syGOAT
[Feature]: PagedAttention for CPU-memory constraned environments?
feature request
#5434
opened Jun 12, 2024 by
peeteeman
[Feature]: Support [RecurrentGemmaForCausalLM]
new model
Requests to new models
#5431
opened Jun 12, 2024 by
sung-ho-moon
[Bug]: CUDA out of memory when setting prompt_logprobs with larger batch_size
bug
Something isn't working
#5424
opened Jun 11, 2024 by
qaz-wsx-1
[RFC]: Improve guided decoding (logit_processor) APIs and performance.
RFC
#5423
opened Jun 11, 2024 by
rkooo567
[Bug]: vllm deployment of GLM-4V reports KeyError: 'transformer.vision.transformer.layers.45.mlp.fc2.weight'
bug
Something isn't working
#5417
opened Jun 11, 2024 by
zhaobu
[Usage]: How do you specify a specific branch on huggingface to use when downloading a model?
good first issue
Good for newcomers
usage
How to use vllm
#5415
opened Jun 11, 2024 by
fake-name
[Performance]: Qwen2-72B-Instruction-GPTQ-Int4 Openai Server Request Problem
performance
Performance-related issues
#5407
opened Jun 11, 2024 by
syngokhan
hidden-states from final (or middle layers)
feature request
#5406
opened Jun 11, 2024 by
janphilippfranken
[Bug]:The vllm service takes two hours to start Because of NCCL
bug
Something isn't working
#5405
opened Jun 11, 2024 by
zhaotyer
[Bug]: topk=1 and temperature=0 cause different output in vllm
bug
Something isn't working
#5404
opened Jun 11, 2024 by
rangehow
0.4.3 error CUDA error: an illegal memory access was encountered
bug
Something isn't working
#5376
opened Jun 10, 2024 by
maxin9966
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.