Commit Graph

2417 Commits

Author SHA1 Message Date
Gregory Shtrasberg
9984605412
[AMD][CI/Build] Disambiguation of the function call for ROCm 6.2 headers compatibility (#7477)
Co-authored-by: Charlie Fu <Charlie.Fu@amd.com>
2024-08-21 16:47:36 -07:00
youkaichao
7eebe8ccaa
[distributed][misc] error on same VLLM_HOST_IP setting (#7756) 2024-08-21 16:25:34 -07:00
Dipika Sikka
8678a69ab5
[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7527)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
2024-08-21 16:17:10 -07:00
William Lin
5844017285
[ci] [multi-step] narrow multi-step test dependency paths (#7760) 2024-08-21 15:52:40 -07:00
Peter Salas
1ca0d4f86b
[Model] Add UltravoxModel and UltravoxConfig (#7615) 2024-08-21 22:49:39 +00:00
William Lin
dd53c4b023
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-21 15:39:26 -07:00
Robert Shaw
970dfdc01d
[Frontend] Improve Startup Failure UX (#7716) 2024-08-21 19:53:01 +00:00
William Lin
91f4522cbf
[multi-step] Raise error if not using async engine (#7703) 2024-08-21 11:49:19 -07:00
sasha0552
1b32e02648
[Bugfix] Pass PYTHONPATH from setup.py to CMake (#7730) 2024-08-21 11:17:48 -07:00
Robert Shaw
f7e3b0c5aa
[Bugfix][Frontend] Fix Issues Under High Load With zeromq Frontend (#7394)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-21 13:34:14 -04:00
Brian Li
d3c002eadc
[Bugfix] chat method add_generation_prompt param (#7734) 2024-08-21 17:33:35 +00:00
Nick Hill
9b73a2f498
[Spec Decoding] Use target model max length as default for draft model (#7706) 2024-08-22 00:23:22 +08:00
Isotr0py
6925cdbeea
[Bugfix][Hardware][CPU] Fix mm_limits initialization for CPU backend (#7735) 2024-08-21 16:23:03 +00:00
LI MOU
53328d7536
[BUG] fix crash on flashinfer backend with cudagraph disabled, when attention group_size not in [1,2,4,8] (#7509) 2024-08-21 08:54:31 -07:00
Nick Hill
c75363fbc0
[BugFix] Avoid premature async generator exit and raise all exception variations (#7698) 2024-08-21 11:45:55 -04:00
sasha0552
dd3fa0e430
[Bugfix] Mirror jinja2 in pyproject.toml (#7723) 2024-08-21 13:41:17 +00:00
Cyrus Leung
baaedfdb2d
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
2024-08-20 23:28:21 -07:00
Roger Wang
4506641212
[Doc] Section for Multimodal Language Models (#7719) 2024-08-20 23:24:01 -07:00
Isotr0py
12e1c65bc9
[Model] Add AWQ quantization support for InternVL2 model (#7187) 2024-08-20 23:18:57 -07:00
youkaichao
b74a125800
[ci] try to log process using the port to debug the port usage (#7711) 2024-08-20 17:41:12 -07:00
Antoni Baum
66a9e713a7
[Core] Pipe worker_class_fn argument in Executor (#7707) 2024-08-21 00:37:39 +00:00
youkaichao
9e51b6a626
[ci][test] adjust max wait time for cpu offloading test (#7709) 2024-08-20 17:12:44 -07:00
Kunshang Ji
6e4658c7aa
[Intel GPU] fix xpu not support punica kernel (which use torch.library.custom_op) (#7685) 2024-08-20 12:01:09 -07:00
Antoni Baum
3b682179dd
[Core] Add AttentionState abstraction (#7663) 2024-08-20 18:50:45 +00:00
Lucas Wilkinson
c6af027a35
[Misc] Add jinja2 as an explicit build requirement (#7695) 2024-08-20 17:17:47 +00:00
Ronen Schaffer
2aa00d59ad
[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266)
[CI/Build] Pin OpenTelemetry versions and make a availability errors clearer (#7266)
2024-08-20 10:02:21 -07:00
Kunshang Ji
c42590f97a
[Hardware] [Intel GPU] refactor xpu worker/executor (#7686) 2024-08-20 09:54:10 -07:00
Isotr0py
aae6927be0
[VLM][Model] Add test for InternViT vision encoder (#7409) 2024-08-20 23:10:20 +08:00
Ilya Lavrenov
398521ad19
[OpenVINO] Updated documentation (#7687) 2024-08-20 07:33:56 -06:00
Lucas Wilkinson
5288c06aa0
[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174) 2024-08-20 07:09:33 -06:00
Kunshang Ji
b6f99a6ffe
[Core] Refactor executor classes for easier inheritance (#7673)
[Core] Refactor executor classes to make it easier to inherit GPUExecutor (#7673)
2024-08-20 00:56:50 -07:00
youkaichao
ad28a74beb
[misc][cuda] add warning for pynvml user (#7675) 2024-08-20 00:35:09 -07:00
jianyizh
e6d811dd13
[XPU] fallback to native implementation for xpu custom op (#7670) 2024-08-20 00:26:09 -07:00
youkaichao
c4be16e1a7
[misc] add nvidia related library in collect env (#7674) 2024-08-19 23:22:49 -07:00
Kuntai Du
3d8a5f063d
[CI] Organizing performance benchmark files (#7616) 2024-08-19 22:43:54 -07:00
Zijian Hu
f4fc7337bf
[Bugfix] support tie_word_embeddings for all models (#5724) 2024-08-19 20:00:04 -07:00
Kevin H. Luu
0df7ec0b2d
[ci] Install Buildkite test suite analysis (#7667)
Signed-off-by: kevin <kevin@anyscale.com>
2024-08-19 19:55:04 -07:00
Abhinav Goyal
312f761232
[Speculative Decoding] Fixing hidden states handling in batch expansion (#7508) 2024-08-19 17:58:14 -07:00
youkaichao
e54ebc2f8f
[doc] fix doc build error caused by msgspec (#7659) 2024-08-19 17:50:59 -07:00
Travis Johnson
67e02fa8a4
[Bugfix] use StoreBoolean instead of type=bool for --disable-logprobs-during-spec-decoding (#7665)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-08-20 00:43:09 +00:00
Woosuk Kwon
43735bf5e1
[TPU] Remove redundant input tensor cloning (#7660) 2024-08-19 15:55:04 -07:00
Andrew Song
da115230fd
[Bugfix] Don't disable existing loggers (#7664) 2024-08-19 15:11:58 -07:00
Isotr0py
7601cb044d
[Core] Support tensor parallelism for GGUF quantization (#7520) 2024-08-19 17:30:14 -04:00
William Lin
47b65a5508
[core] Multi Step Scheduling (#7000)
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
2024-08-19 13:52:13 -07:00
Ali Panahi
dad961ef5c
[Bugfix] fix lora_dtype value type in arg_utils.py - part 2 (#5428) 2024-08-19 20:47:00 +00:00
Cody Yu
3ac50b47d0
[MISC] Add prefix cache hit rate to metrics (#7606) 2024-08-19 11:52:07 -07:00
Woosuk Kwon
df845b2b46
[Misc] Remove Gemma RoPE (#7638) 2024-08-19 09:29:31 -07:00
Kunshang Ji
1a36287b89
[Bugfix] Fix xpu build (#7644) 2024-08-18 22:00:09 -07:00
Peng Guanwen
f710fb5265
[Core] Use flashinfer sampling kernel when available (#7137)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-19 03:24:03 +00:00
SangBin Cho
ff7ec82c4d
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00