Robert Shaw
|
970dfdc01d
|
[Frontend] Improve Startup Failure UX (#7716)
|
2024-08-21 19:53:01 +00:00 |
|
William Lin
|
91f4522cbf
|
[multi-step] Raise error if not using async engine (#7703)
|
2024-08-21 11:49:19 -07:00 |
|
sasha0552
|
1b32e02648
|
[Bugfix] Pass PYTHONPATH from setup.py to CMake (#7730)
|
2024-08-21 11:17:48 -07:00 |
|
Robert Shaw
|
f7e3b0c5aa
|
[Bugfix][Frontend] Fix Issues Under High Load With zeromq Frontend (#7394)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-21 13:34:14 -04:00 |
|
Brian Li
|
d3c002eadc
|
[Bugfix] chat method add_generation_prompt param (#7734)
|
2024-08-21 17:33:35 +00:00 |
|
Nick Hill
|
9b73a2f498
|
[Spec Decoding] Use target model max length as default for draft model (#7706)
|
2024-08-22 00:23:22 +08:00 |
|
Isotr0py
|
6925cdbeea
|
[Bugfix][Hardware][CPU] Fix mm_limits initialization for CPU backend (#7735)
|
2024-08-21 16:23:03 +00:00 |
|
LI MOU
|
53328d7536
|
[BUG] fix crash on flashinfer backend with cudagraph disabled, when attention group_size not in [1,2,4,8] (#7509)
|
2024-08-21 08:54:31 -07:00 |
|
Nick Hill
|
c75363fbc0
|
[BugFix] Avoid premature async generator exit and raise all exception variations (#7698)
|
2024-08-21 11:45:55 -04:00 |
|
sasha0552
|
dd3fa0e430
|
[Bugfix] Mirror jinja2 in pyproject.toml (#7723)
|
2024-08-21 13:41:17 +00:00 |
|
Cyrus Leung
|
baaedfdb2d
|
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
|
2024-08-20 23:28:21 -07:00 |
|
Roger Wang
|
4506641212
|
[Doc] Section for Multimodal Language Models (#7719)
|
2024-08-20 23:24:01 -07:00 |
|
Isotr0py
|
12e1c65bc9
|
[Model] Add AWQ quantization support for InternVL2 model (#7187)
|
2024-08-20 23:18:57 -07:00 |
|
youkaichao
|
b74a125800
|
[ci] try to log process using the port to debug the port usage (#7711)
|
2024-08-20 17:41:12 -07:00 |
|
Antoni Baum
|
66a9e713a7
|
[Core] Pipe worker_class_fn argument in Executor (#7707)
|
2024-08-21 00:37:39 +00:00 |
|
youkaichao
|
9e51b6a626
|
[ci][test] adjust max wait time for cpu offloading test (#7709)
|
2024-08-20 17:12:44 -07:00 |
|
Kunshang Ji
|
6e4658c7aa
|
[Intel GPU] fix xpu not support punica kernel (which use torch.library.custom_op) (#7685)
|
2024-08-20 12:01:09 -07:00 |
|
Antoni Baum
|
3b682179dd
|
[Core] Add AttentionState abstraction (#7663)
|
2024-08-20 18:50:45 +00:00 |
|
Lucas Wilkinson
|
c6af027a35
|
[Misc] Add jinja2 as an explicit build requirement (#7695)
|
2024-08-20 17:17:47 +00:00 |
|
Ronen Schaffer
|
2aa00d59ad
|
[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266)
[CI/Build] Pin OpenTelemetry versions and make a availability errors clearer (#7266)
|
2024-08-20 10:02:21 -07:00 |
|
Kunshang Ji
|
c42590f97a
|
[Hardware] [Intel GPU] refactor xpu worker/executor (#7686)
|
2024-08-20 09:54:10 -07:00 |
|
Isotr0py
|
aae6927be0
|
[VLM][Model] Add test for InternViT vision encoder (#7409)
|
2024-08-20 23:10:20 +08:00 |
|
Ilya Lavrenov
|
398521ad19
|
[OpenVINO] Updated documentation (#7687)
|
2024-08-20 07:33:56 -06:00 |
|
Lucas Wilkinson
|
5288c06aa0
|
[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174)
|
2024-08-20 07:09:33 -06:00 |
|
Kunshang Ji
|
b6f99a6ffe
|
[Core] Refactor executor classes for easier inheritance (#7673)
[Core] Refactor executor classes to make it easier to inherit GPUExecutor (#7673)
|
2024-08-20 00:56:50 -07:00 |
|
youkaichao
|
ad28a74beb
|
[misc][cuda] add warning for pynvml user (#7675)
|
2024-08-20 00:35:09 -07:00 |
|
jianyizh
|
e6d811dd13
|
[XPU] fallback to native implementation for xpu custom op (#7670)
|
2024-08-20 00:26:09 -07:00 |
|
youkaichao
|
c4be16e1a7
|
[misc] add nvidia related library in collect env (#7674)
|
2024-08-19 23:22:49 -07:00 |
|
Kuntai Du
|
3d8a5f063d
|
[CI] Organizing performance benchmark files (#7616)
|
2024-08-19 22:43:54 -07:00 |
|
Zijian Hu
|
f4fc7337bf
|
[Bugfix] support tie_word_embeddings for all models (#5724)
|
2024-08-19 20:00:04 -07:00 |
|
Kevin H. Luu
|
0df7ec0b2d
|
[ci] Install Buildkite test suite analysis (#7667)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-08-19 19:55:04 -07:00 |
|
Abhinav Goyal
|
312f761232
|
[Speculative Decoding] Fixing hidden states handling in batch expansion (#7508)
|
2024-08-19 17:58:14 -07:00 |
|
youkaichao
|
e54ebc2f8f
|
[doc] fix doc build error caused by msgspec (#7659)
|
2024-08-19 17:50:59 -07:00 |
|
Travis Johnson
|
67e02fa8a4
|
[Bugfix] use StoreBoolean instead of type=bool for --disable-logprobs-during-spec-decoding (#7665)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-08-20 00:43:09 +00:00 |
|
Woosuk Kwon
|
43735bf5e1
|
[TPU] Remove redundant input tensor cloning (#7660)
|
2024-08-19 15:55:04 -07:00 |
|
Andrew Song
|
da115230fd
|
[Bugfix] Don't disable existing loggers (#7664)
|
2024-08-19 15:11:58 -07:00 |
|
Isotr0py
|
7601cb044d
|
[Core] Support tensor parallelism for GGUF quantization (#7520)
|
2024-08-19 17:30:14 -04:00 |
|
William Lin
|
47b65a5508
|
[core] Multi Step Scheduling (#7000)
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
|
2024-08-19 13:52:13 -07:00 |
|
Ali Panahi
|
dad961ef5c
|
[Bugfix] fix lora_dtype value type in arg_utils.py - part 2 (#5428)
|
2024-08-19 20:47:00 +00:00 |
|
Cody Yu
|
3ac50b47d0
|
[MISC] Add prefix cache hit rate to metrics (#7606)
|
2024-08-19 11:52:07 -07:00 |
|
Woosuk Kwon
|
df845b2b46
|
[Misc] Remove Gemma RoPE (#7638)
|
2024-08-19 09:29:31 -07:00 |
|
Kunshang Ji
|
1a36287b89
|
[Bugfix] Fix xpu build (#7644)
|
2024-08-18 22:00:09 -07:00 |
|
Peng Guanwen
|
f710fb5265
|
[Core] Use flashinfer sampling kernel when available (#7137)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-19 03:24:03 +00:00 |
|
SangBin Cho
|
ff7ec82c4d
|
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109)
|
2024-08-18 17:57:20 -07:00 |
|
Woosuk Kwon
|
200a2ffa6b
|
[Misc] Refactor Llama3 RoPE initialization (#7637)
|
2024-08-18 17:18:12 -07:00 |
|
Alex Brooks
|
40e1360bb6
|
[CI/Build] Add text-only test for Qwen models (#7475)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-08-19 07:43:46 +08:00 |
|
Robert Shaw
|
e3b318216d
|
[ Bugfix ] Fix Prometheus Metrics With zeromq Frontend (#7279)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-18 20:19:48 +00:00 |
|
Woosuk Kwon
|
ab7165f2c7
|
[TPU] Optimize RoPE forward_native2 (#7636)
|
2024-08-18 01:15:10 -07:00 |
|
Woosuk Kwon
|
0c2fa50b84
|
[TPU] Use mark_dynamic only for dummy run (#7634)
|
2024-08-18 00:18:53 -07:00 |
|
Woosuk Kwon
|
ce143353c6
|
[TPU] Skip creating empty tensor (#7630)
|
2024-08-17 14:22:46 -07:00 |
|