Peter Salas
|
57792ed469
|
[Doc] Fix incorrect docs from #7615 (#7788)
|
2024-08-22 10:02:06 -07:00 |
|
Jiaxin Shan
|
d3b5b98021
|
[Misc] Enhance prefix-caching benchmark tool (#6568)
|
2024-08-22 09:32:02 -07:00 |
|
Travis Johnson
|
cc0eaf12b1
|
[Bugfix] spec decode handle None entries in topk args in create_sequence_group_output (#7232)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-08-22 09:33:48 -04:00 |
|
Dipika Sikka
|
955b5191c9
|
[Misc] update fp8 to use vLLMParameter (#7437)
|
2024-08-22 08:36:18 -04:00 |
|
Lucas Wilkinson
|
55d63b1211
|
[Bugfix] Don't build machete on cuda <12.0 (#7757)
|
2024-08-22 08:28:52 -04:00 |
|
Flex Wang
|
4f419c00a6
|
Fix ShardedStateLoader for vllm fp8 quantization (#7708)
|
2024-08-22 08:25:04 -04:00 |
|
Abhinav Goyal
|
a3fce56b88
|
[Speculative Decoding] EAGLE Implementation with Top-1 proposer (#6830)
|
2024-08-22 02:42:24 -07:00 |
|
Woosuk Kwon
|
b3856bef7d
|
[Misc] Use torch.compile for GemmaRMSNorm (#7642)
|
2024-08-22 01:14:13 -07:00 |
|
youkaichao
|
8c6f694a79
|
[ci] refine dependency for distributed tests (#7776)
|
2024-08-22 00:54:15 -07:00 |
|
Woosuk Kwon
|
eeee1c3b1a
|
[TPU] Avoid initializing TPU runtime in is_tpu (#7763)
|
2024-08-21 21:31:49 -07:00 |
|
Michael Goin
|
aae74ef95c
|
Revert "[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)" (#7764)
|
2024-08-22 03:42:14 +00:00 |
|
Joe Runde
|
cde9183b40
|
[Bug][Frontend] Improve ZMQ client robustness (#7443)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-22 02:18:11 +00:00 |
|
zifeitong
|
df1a21131d
|
[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710)
|
2024-08-22 09:36:24 +08:00 |
|
Luka Govedič
|
7937009a7e
|
[Kernel] Replaced blockReduce[...] functions with cub::BlockReduce (#7233)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-21 20:18:00 -04:00 |
|
Gregory Shtrasberg
|
9984605412
|
[AMD][CI/Build] Disambiguation of the function call for ROCm 6.2 headers compatibility (#7477)
Co-authored-by: Charlie Fu <Charlie.Fu@amd.com>
|
2024-08-21 16:47:36 -07:00 |
|
youkaichao
|
7eebe8ccaa
|
[distributed][misc] error on same VLLM_HOST_IP setting (#7756)
|
2024-08-21 16:25:34 -07:00 |
|
Dipika Sikka
|
8678a69ab5
|
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
|
2024-08-21 16:17:10 -07:00 |
|
William Lin
|
5844017285
|
[ci] [multi-step] narrow multi-step test dependency paths (#7760)
|
2024-08-21 15:52:40 -07:00 |
|
Peter Salas
|
1ca0d4f86b
|
[Model] Add UltravoxModel and UltravoxConfig (#7615)
|
2024-08-21 22:49:39 +00:00 |
|
William Lin
|
dd53c4b023
|
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-21 15:39:26 -07:00 |
|
Robert Shaw
|
970dfdc01d
|
[Frontend] Improve Startup Failure UX (#7716)
|
2024-08-21 19:53:01 +00:00 |
|
William Lin
|
91f4522cbf
|
[multi-step] Raise error if not using async engine (#7703)
|
2024-08-21 11:49:19 -07:00 |
|
sasha0552
|
1b32e02648
|
[Bugfix] Pass PYTHONPATH from setup.py to CMake (#7730)
|
2024-08-21 11:17:48 -07:00 |
|
Robert Shaw
|
f7e3b0c5aa
|
[Bugfix][Frontend] Fix Issues Under High Load With zeromq Frontend (#7394)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-21 13:34:14 -04:00 |
|
Brian Li
|
d3c002eadc
|
[Bugfix] chat method add_generation_prompt param (#7734)
|
2024-08-21 17:33:35 +00:00 |
|
Nick Hill
|
9b73a2f498
|
[Spec Decoding] Use target model max length as default for draft model (#7706)
|
2024-08-22 00:23:22 +08:00 |
|
Isotr0py
|
6925cdbeea
|
[Bugfix][Hardware][CPU] Fix mm_limits initialization for CPU backend (#7735)
|
2024-08-21 16:23:03 +00:00 |
|
LI MOU
|
53328d7536
|
[BUG] fix crash on flashinfer backend with cudagraph disabled, when attention group_size not in [1,2,4,8] (#7509)
|
2024-08-21 08:54:31 -07:00 |
|
Nick Hill
|
c75363fbc0
|
[BugFix] Avoid premature async generator exit and raise all exception variations (#7698)
|
2024-08-21 11:45:55 -04:00 |
|
sasha0552
|
dd3fa0e430
|
[Bugfix] Mirror jinja2 in pyproject.toml (#7723)
|
2024-08-21 13:41:17 +00:00 |
|
Cyrus Leung
|
baaedfdb2d
|
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
|
2024-08-20 23:28:21 -07:00 |
|
Roger Wang
|
4506641212
|
[Doc] Section for Multimodal Language Models (#7719)
|
2024-08-20 23:24:01 -07:00 |
|
Isotr0py
|
12e1c65bc9
|
[Model] Add AWQ quantization support for InternVL2 model (#7187)
|
2024-08-20 23:18:57 -07:00 |
|
youkaichao
|
b74a125800
|
[ci] try to log process using the port to debug the port usage (#7711)
|
2024-08-20 17:41:12 -07:00 |
|
Antoni Baum
|
66a9e713a7
|
[Core] Pipe worker_class_fn argument in Executor (#7707)
|
2024-08-21 00:37:39 +00:00 |
|
youkaichao
|
9e51b6a626
|
[ci][test] adjust max wait time for cpu offloading test (#7709)
|
2024-08-20 17:12:44 -07:00 |
|
Kunshang Ji
|
6e4658c7aa
|
[Intel GPU] fix xpu not support punica kernel (which use torch.library.custom_op) (#7685)
|
2024-08-20 12:01:09 -07:00 |
|
Antoni Baum
|
3b682179dd
|
[Core] Add AttentionState abstraction (#7663)
|
2024-08-20 18:50:45 +00:00 |
|
Lucas Wilkinson
|
c6af027a35
|
[Misc] Add jinja2 as an explicit build requirement (#7695)
|
2024-08-20 17:17:47 +00:00 |
|
Ronen Schaffer
|
2aa00d59ad
|
[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266)
[CI/Build] Pin OpenTelemetry versions and make a availability errors clearer (#7266)
|
2024-08-20 10:02:21 -07:00 |
|
Kunshang Ji
|
c42590f97a
|
[Hardware] [Intel GPU] refactor xpu worker/executor (#7686)
|
2024-08-20 09:54:10 -07:00 |
|
Isotr0py
|
aae6927be0
|
[VLM][Model] Add test for InternViT vision encoder (#7409)
|
2024-08-20 23:10:20 +08:00 |
|
Ilya Lavrenov
|
398521ad19
|
[OpenVINO] Updated documentation (#7687)
|
2024-08-20 07:33:56 -06:00 |
|
Lucas Wilkinson
|
5288c06aa0
|
[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174)
|
2024-08-20 07:09:33 -06:00 |
|
Kunshang Ji
|
b6f99a6ffe
|
[Core] Refactor executor classes for easier inheritance (#7673)
[Core] Refactor executor classes to make it easier to inherit GPUExecutor (#7673)
|
2024-08-20 00:56:50 -07:00 |
|
youkaichao
|
ad28a74beb
|
[misc][cuda] add warning for pynvml user (#7675)
|
2024-08-20 00:35:09 -07:00 |
|
jianyizh
|
e6d811dd13
|
[XPU] fallback to native implementation for xpu custom op (#7670)
|
2024-08-20 00:26:09 -07:00 |
|
youkaichao
|
c4be16e1a7
|
[misc] add nvidia related library in collect env (#7674)
|
2024-08-19 23:22:49 -07:00 |
|
Kuntai Du
|
3d8a5f063d
|
[CI] Organizing performance benchmark files (#7616)
|
2024-08-19 22:43:54 -07:00 |
|
Zijian Hu
|
f4fc7337bf
|
[Bugfix] support tie_word_embeddings for all models (#5724)
|
2024-08-19 20:00:04 -07:00 |
|