Andrew Wang
|
97a6be95ba
|
[Misc] improve logits processors logging message (#7435)
|
2024-08-13 02:29:34 +00:00 |
|
Cyrus Leung
|
9ba85bc152
|
[mypy] Misc. typing improvements (#7417)
|
2024-08-13 09:20:20 +08:00 |
|
Rui Qiao
|
198d6a2898
|
[Core] Shut down aDAG workers with clean async llm engine exit (#7224)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-12 17:57:16 -07:00 |
|
Daniele
|
774cd1d3bf
|
[CI/Build] bump minimum cmake version (#6999)
|
2024-08-12 16:29:20 -07:00 |
|
sasha0552
|
91294d56e1
|
[Bugfix] Handle PackageNotFoundError when checking for xpu version (#7398)
|
2024-08-12 16:07:20 -07:00 |
|
jon-chuang
|
a046f86397
|
[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-12 22:47:41 +00:00 |
|
Cyrus Leung
|
4ddc4743d7
|
[Core] Consolidate GB constant and enable float GB arguments (#7416)
|
2024-08-12 14:14:14 -07:00 |
|
Lucas Wilkinson
|
6aa33cb2dd
|
[Misc] Use scalar type to dispatch to different gptq_marlin kernels (#7323)
|
2024-08-12 14:40:13 -04:00 |
|
Kevin H. Luu
|
1137f343aa
|
[ci] Cancel fastcheck when PR is ready (#7433)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-08-12 10:59:14 -07:00 |
|
Kevin H. Luu
|
9b3e2edd30
|
[ci] Cancel fastcheck run when PR is marked ready (#7427)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-08-12 10:56:52 -07:00 |
|
Kevin H. Luu
|
65950e8f58
|
[ci] Entrypoints run upon changes in vllm/ (#7423)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-08-12 10:18:03 -07:00 |
|
Woosuk Kwon
|
cfba4def5d
|
[Bugfix] Fix logit soft cap in flash-attn backend (#7425)
|
2024-08-12 09:58:28 -07:00 |
|
Daniele
|
d2bc4510a4
|
[CI/Build] bump Dockerfile.neuron image base, use public ECR (#6832)
|
2024-08-12 09:53:35 -07:00 |
|
Cyrus Leung
|
24154f8618
|
[Frontend] Disallow passing model as both argument and option (#7347)
|
2024-08-12 12:58:34 +00:00 |
|
Roger Wang
|
e6e42e4b17
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
Lily Liu
|
ec2affa8ae
|
[Kernel] Flashinfer correctness fix for v0.1.3 (#7319)
|
2024-08-12 07:59:17 +00:00 |
|
Roger Wang
|
86ab567bae
|
[CI/Build] Minor refactoring for vLLM assets (#7407)
|
2024-08-12 02:41:52 +00:00 |
|
Simon Mo
|
f020a6297e
|
[Docs] Update readme (#7316)
|
2024-08-11 17:13:37 -07:00 |
|
youkaichao
|
6c8e595710
|
[misc] add commit id in collect env (#7405)
|
2024-08-11 15:40:48 -07:00 |
|
tomeras91
|
02b1988b9f
|
[Doc] building vLLM with VLLM_TARGET_DEVICE=empty (#7403)
|
2024-08-11 14:38:17 -07:00 |
|
tomeras91
|
386087970a
|
[CI/Build] build on empty device for better dev experience (#4773)
|
2024-08-11 13:09:44 -07:00 |
|
William Lin
|
c08e2b3086
|
[core] [2/N] refactor worker_base input preparation for multi-step (#7387)
|
2024-08-11 08:50:08 -07:00 |
|
Noam Gat
|
4fb7b52a2c
|
Updating LM Format Enforcer version to v0.10.6 (#7189)
|
2024-08-11 08:11:50 -04:00 |
|
Woosuk Kwon
|
90bab18f24
|
[TPU] Use mark_dynamic to reduce compilation time (#7340)
|
2024-08-10 18:12:22 -07:00 |
|
Isotr0py
|
4c5d8e8ea9
|
[Bugfix] Fix phi3v batch inference when images have different aspect ratio (#7392)
|
2024-08-10 16:19:33 +00:00 |
|
Cade Daniel
|
baa240252e
|
[Core] Fix edge case in chunked prefill + block manager v2 (#7380)
|
2024-08-09 23:48:49 +00:00 |
|
Antoni Baum
|
999ef0b917
|
[Misc] Add numpy implementation of compute_slot_mapping (#7377)
|
2024-08-09 22:52:29 +00:00 |
|
Dipika Sikka
|
5c6c54d67a
|
[Bugfix] Fix PerTensorScaleParameter weight loading for fused models (#7376)
|
2024-08-09 21:23:46 +00:00 |
|
Mahesh Keralapura
|
933790c209
|
[Core] Add span metrics for model_forward, scheduler and sampler time (#7089)
|
2024-08-09 13:55:13 -07:00 |
|
Roger Wang
|
70d268a399
|
[Bugfix] Fix ITL recording in serving benchmark (#7372)
|
2024-08-09 10:00:00 -07:00 |
|
Pooya Davoodi
|
249b88228d
|
[Frontend] Support embeddings in the run_batch API (#7132)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-08-09 09:48:21 -07:00 |
|
Alexander Matveev
|
74af2bbd90
|
[Bugfix] Fix reinit procedure in ModelInputForGPUBuilder (#7360)
|
2024-08-09 16:35:49 +00:00 |
|
Alexander Matveev
|
fc7b8d1eef
|
[Performance] e2e overheads reduction: Small followup diff (#7364)
|
2024-08-09 15:49:36 +00:00 |
|
Isotr0py
|
67abdbb42f
|
[VLM][Doc] Add stop_token_ids to InternVL example (#7354)
|
2024-08-09 14:51:04 +00:00 |
|
Mor Zusman
|
07ab160741
|
[Model][Jamba] Mamba cache single buffer (#6739)
Co-authored-by: Mor Zusman <morz@ai21.com>
|
2024-08-09 10:07:06 -04:00 |
|
Nick Hill
|
b4e9528f95
|
[Core] Streamline stream termination in AsyncLLMEngine (#7336)
|
2024-08-09 07:06:36 +00:00 |
|
William Lin
|
57b7be0e1c
|
[Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (#6971)
|
2024-08-09 05:42:45 +00:00 |
|
Travis Johnson
|
99b4cf5f23
|
[Bugfix] Fix speculative decoding with MLPSpeculator with padded vocabulary (#7218)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-08-08 22:08:46 -07:00 |
|
Alexander Matveev
|
e02ac55617
|
[Performance] Optimize e2e overheads: Reduce python allocations (#7162)
|
2024-08-08 21:34:28 -07:00 |
|
Woosuk Kwon
|
73388c07a4
|
[TPU] Fix dockerfile.tpu (#7331)
|
2024-08-08 20:24:58 -07:00 |
|
Cyrus Leung
|
7eb4a51c5f
|
[Core] Support serving encoder/decoder models (#7258)
|
2024-08-09 10:39:41 +08:00 |
|
Siyuan Liu
|
0fa14907da
|
[TPU] Add Load-time W8A16 quantization for TPU Backend (#7005)
|
2024-08-08 18:35:49 -07:00 |
|
Simon Mo
|
5923532e15
|
Add Skywork AI as Sponsor (#7314)
|
2024-08-08 13:59:57 -07:00 |
|
Jee Jee Li
|
a049b107e2
|
[Misc] Temporarily resolve the error of BitAndBytes (#7308)
|
2024-08-08 13:42:58 -07:00 |
|
Isotr0py
|
8334c39f37
|
[Bugfix] Fix new Llama3.1 GGUF model loading (#7269)
|
2024-08-08 13:42:44 -07:00 |
|
Daniele
|
e904576743
|
[CI/Build] Dockerfile.cpu improvements (#7298)
|
2024-08-08 15:24:52 -04:00 |
|
Michael Goin
|
e14fb22e59
|
[Doc] Put collect_env issue output in a <detail> block (#7310)
|
2024-08-08 11:22:49 -07:00 |
|
Zach Zheng
|
782e53ab59
|
[Bugfix][fast] Fix the get_num_blocks_touched logic (#6849)
|
2024-08-08 10:43:30 -07:00 |
|
Joe Runde
|
21b9c49aa3
|
[Frontend] Kill the server on engine death (#6594)
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-08 09:47:48 -07:00 |
|
Luka Govedič
|
5fb4a3f678
|
[Bugfix][Kernel] Increased atol to fix failing tests (#7305)
|
2024-08-08 12:16:13 -04:00 |
|