Andrew Song
|
da115230fd
|
[Bugfix] Don't disable existing loggers (#7664)
|
2024-08-19 15:11:58 -07:00 |
|
Isotr0py
|
7601cb044d
|
[Core] Support tensor parallelism for GGUF quantization (#7520)
|
2024-08-19 17:30:14 -04:00 |
|
William Lin
|
47b65a5508
|
[core] Multi Step Scheduling (#7000)
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
|
2024-08-19 13:52:13 -07:00 |
|
Ali Panahi
|
dad961ef5c
|
[Bugfix] fix lora_dtype value type in arg_utils.py - part 2 (#5428)
|
2024-08-19 20:47:00 +00:00 |
|
Cody Yu
|
3ac50b47d0
|
[MISC] Add prefix cache hit rate to metrics (#7606)
|
2024-08-19 11:52:07 -07:00 |
|
Woosuk Kwon
|
df845b2b46
|
[Misc] Remove Gemma RoPE (#7638)
|
2024-08-19 09:29:31 -07:00 |
|
Kunshang Ji
|
1a36287b89
|
[Bugfix] Fix xpu build (#7644)
|
2024-08-18 22:00:09 -07:00 |
|
Peng Guanwen
|
f710fb5265
|
[Core] Use flashinfer sampling kernel when available (#7137)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-19 03:24:03 +00:00 |
|
SangBin Cho
|
ff7ec82c4d
|
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109)
|
2024-08-18 17:57:20 -07:00 |
|
Woosuk Kwon
|
200a2ffa6b
|
[Misc] Refactor Llama3 RoPE initialization (#7637)
|
2024-08-18 17:18:12 -07:00 |
|
Alex Brooks
|
40e1360bb6
|
[CI/Build] Add text-only test for Qwen models (#7475)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-08-19 07:43:46 +08:00 |
|
Robert Shaw
|
e3b318216d
|
[ Bugfix ] Fix Prometheus Metrics With zeromq Frontend (#7279)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-18 20:19:48 +00:00 |
|
Woosuk Kwon
|
ab7165f2c7
|
[TPU] Optimize RoPE forward_native2 (#7636)
|
2024-08-18 01:15:10 -07:00 |
|
Woosuk Kwon
|
0c2fa50b84
|
[TPU] Use mark_dynamic only for dummy run (#7634)
|
2024-08-18 00:18:53 -07:00 |
|
Woosuk Kwon
|
ce143353c6
|
[TPU] Skip creating empty tensor (#7630)
|
2024-08-17 14:22:46 -07:00 |
|
Roger Wang
|
bbf55c4805
|
[VLM] Refactor MultiModalConfig initialization and profiling (#7530)
|
2024-08-17 13:30:55 -07:00 |
|
Jee Jee Li
|
1ef13cf92f
|
[Misc]Fix BitAndBytes exception messages (#7626)
|
2024-08-17 12:02:14 -07:00 |
|
youkaichao
|
832163b875
|
[ci][test] allow longer wait time for api server (#7629)
|
2024-08-17 11:26:38 -07:00 |
|
Besher Alkurdi
|
e73f76eec6
|
[Model] Pipeline parallel support for JAIS (#7603)
|
2024-08-17 11:11:09 -07:00 |
|
youkaichao
|
d95cc0a55c
|
[core][misc] update libcudart finding (#7620)
Co-authored-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2024-08-16 23:01:35 -07:00 |
|
youkaichao
|
5bf45db7df
|
[ci][test] fix engine/logger test (#7621)
|
2024-08-16 23:00:59 -07:00 |
|
youkaichao
|
eed020f673
|
[misc] use nvml to get consistent device name (#7582)
|
2024-08-16 21:15:13 -07:00 |
|
Xander Johnson
|
7c0b7ea214
|
[Bugfix] add >= 1.0 constraint for openai dependency (#7612)
|
2024-08-16 20:56:01 -07:00 |
|
SangBin Cho
|
4706eb628e
|
[aDAG] Unflake aDAG + PP tests (#7600)
|
2024-08-16 20:49:30 -07:00 |
|
Rui Qiao
|
bae888cb8e
|
[Bugfix] Clear engine reference in AsyncEngineRPCServer (#7618)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-16 20:44:05 -07:00 |
|
Alexei-V-Ivanov-AMD
|
6bd19551b0
|
.[Build/CI] Enabling passing AMD tests. (#7610)
|
2024-08-16 20:25:32 -07:00 |
|
bnellnm
|
e680349994
|
[Bugfix] Fix custom_ar support check (#7617)
|
2024-08-16 19:05:49 -07:00 |
|
Michael Goin
|
44f26a9466
|
[Model] Align nemotron config with final HF state and fix lm-eval-small (#7611)
|
2024-08-16 15:56:34 -07:00 |
|
bnellnm
|
37fd47e780
|
[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596)
|
2024-08-16 14:00:11 -07:00 |
|
bnellnm
|
7759ae958f
|
[Kernel][Misc] dynamo support for ScalarType (#7594)
|
2024-08-16 13:59:49 -07:00 |
|
bnellnm
|
9f69856356
|
[Kernel] register punica functions as torch ops (#7591)
|
2024-08-16 13:59:38 -07:00 |
|
Michael Goin
|
d4f0f17b02
|
[Doc] Update quantization supported hardware table (#7595)
|
2024-08-16 13:59:27 -07:00 |
|
Michael Goin
|
b3f4e17935
|
[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints (#7444)
|
2024-08-16 13:59:16 -07:00 |
|
Mahesh Keralapura
|
93478b63d2
|
[Core] Fix tracking of model forward time in case of PP>1 (#7440)
[Core] Fix tracking of model forward time to the span traces in case of PP>1 (#7440)
|
2024-08-16 13:46:01 -07:00 |
|
William Lin
|
f366f6339b
|
[spec decode] [4/N] Move update_flash_attn_metadata to attn backend (#7571)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-16 11:41:56 -07:00 |
|
Michael Goin
|
855866caa9
|
[Kernel] Add tuned triton configs for ExpertsInt8 (#7601)
|
2024-08-16 11:37:01 -07:00 |
|
Mor Zusman
|
7fc23be81c
|
[Kernel] W8A16 Int8 inside FusedMoE (#7415)
|
2024-08-16 10:06:51 -07:00 |
|
Charlie Fu
|
e837b624f2
|
[Feature][Hardware][Amd] Add fp8 Linear Layer for Rocm (#7210)
|
2024-08-16 10:06:30 -07:00 |
|
fzyzcjy
|
ec724a725e
|
support tqdm in notebooks (#7510)
|
2024-08-16 09:17:50 -07:00 |
|
Gordon Wong
|
0e39a33c6d
|
[Bugfix][Hardware][AMD][Frontend] add quantization param to embedding checking method (#7513)
|
2024-08-16 10:05:18 -06:00 |
|
Kuntai Du
|
6fc5b0f249
|
[CI] Fix crashes of performance benchmark (#7500)
|
2024-08-16 08:08:45 -07:00 |
|
Nick Hill
|
9587b050fb
|
[Core] Use uvloop with zmq-decoupled front-end (#7570)
|
2024-08-15 22:48:07 -07:00 |
|
youkaichao
|
54bd9a03c4
|
register custom op for flash attn and use from torch.ops (#7536)
|
2024-08-15 22:38:56 -07:00 |
|
jon-chuang
|
50b8d08dbd
|
[Misc/Testing] Use torch.testing.assert_close (#7324)
|
2024-08-16 04:24:04 +00:00 |
|
Michael Goin
|
e165528778
|
[CI] Move quantization cpu offload tests out of fastcheck (#7574)
|
2024-08-15 21:16:20 -07:00 |
|
nunjunj
|
3b19e39dc5
|
Chat method for offline llm (#5049)
Co-authored-by: nunjunj <ray@g-3ff9f30f2ed650001.c.vllm-405802.internal>
Co-authored-by: nunjunj <ray@g-1df6075697c3f0001.c.vllm-405802.internal>
Co-authored-by: nunjunj <ray@g-c5a2c23abc49e0001.c.vllm-405802.internal>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-08-15 19:41:34 -07:00 |
|
youkaichao
|
4cd7d47fed
|
[ci/test] rearrange tests and make adag test soft fail (#7572)
|
2024-08-15 19:39:04 -07:00 |
|
Grant Pinkert
|
f878c8feb0
|
[Feature]: Add OpenAI server prompt_logprobs support #6508 (#7453)
|
2024-08-16 02:38:08 +00:00 |
|
shangmingc
|
b67ae00cdb
|
[Misc] Add quantization config support for speculative model. (#7343)
|
2024-08-15 19:34:28 -07:00 |
|
Michael Goin
|
9c8e2d1161
|
[Bugfix][Harmless] Fix float16 dtype for model_is_embedding (#7566)
|
2024-08-15 18:26:19 -07:00 |
|