Kyle Mistele
|
e02ce498be
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-04 13:18:13 -07:00 |
|
alexeykondrat
|
d1dec64243
|
[CI/Build][ROCm] Enabling LoRA tests on ROCm (#7369)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-09-04 11:57:54 -07:00 |
|
Cody Yu
|
2ad2e5608e
|
[MISC] Consolidate FP8 kv-cache tests (#8131)
|
2024-09-04 18:53:25 +00:00 |
|
TimWang
|
ccd7207191
|
chore: Update check-wheel-size.py to read MAX_SIZE_MB from env (#8103)
|
2024-09-03 23:17:05 -07:00 |
|
Roger Wang
|
5231f0898e
|
[Frontend][VLM] Add support for multiple multi-modal items (#8049)
|
2024-08-31 16:35:53 -07:00 |
|
Michael Goin
|
af59df0a10
|
Remove faulty Meta-Llama-3-8B-Instruct-FP8.yaml lm-eval test (#7961)
|
2024-08-28 19:19:17 -04:00 |
|
youkaichao
|
ce6bf3a2cf
|
[torch.compile] avoid Dynamo guard evaluation overhead (#7898)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-08-28 16:10:12 -07:00 |
|
alexeykondrat
|
42e932c7d4
|
[CI/Build][ROCm] Enabling tensorizer tests for ROCm (#7237)
|
2024-08-27 10:09:13 -07:00 |
|
youkaichao
|
64cc644425
|
[core][torch.compile] discard the compile for profiling (#7796)
|
2024-08-26 21:33:58 -07:00 |
|
youkaichao
|
7d9ffa2ae1
|
[misc][core] lazy import outlines (#7831)
|
2024-08-24 00:51:38 -07:00 |
|
Alexander Matveev
|
9db93de20c
|
[Core] Add multi-step support to LLMEngine (#7789)
|
2024-08-23 12:45:53 -07:00 |
|
SangBin Cho
|
c01a6cb231
|
[Ray backend] Better error when pg topology is bad. (#7584)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-08-22 17:44:25 -07:00 |
|
youkaichao
|
8c6f694a79
|
[ci] refine dependency for distributed tests (#7776)
|
2024-08-22 00:54:15 -07:00 |
|
Luka Govedič
|
7937009a7e
|
[Kernel] Replaced blockReduce[...] functions with cub::BlockReduce (#7233)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-21 20:18:00 -04:00 |
|
William Lin
|
5844017285
|
[ci] [multi-step] narrow multi-step test dependency paths (#7760)
|
2024-08-21 15:52:40 -07:00 |
|
Robert Shaw
|
f7e3b0c5aa
|
[Bugfix][Frontend] Fix Issues Under High Load With zeromq Frontend (#7394)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-21 13:34:14 -04:00 |
|
Ronen Schaffer
|
2aa00d59ad
|
[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266)
[CI/Build] Pin OpenTelemetry versions and make a availability errors clearer (#7266)
|
2024-08-20 10:02:21 -07:00 |
|
Kuntai Du
|
3d8a5f063d
|
[CI] Organizing performance benchmark files (#7616)
|
2024-08-19 22:43:54 -07:00 |
|
William Lin
|
47b65a5508
|
[core] Multi Step Scheduling (#7000)
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
|
2024-08-19 13:52:13 -07:00 |
|
Peng Guanwen
|
f710fb5265
|
[Core] Use flashinfer sampling kernel when available (#7137)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-19 03:24:03 +00:00 |
|
Alex Brooks
|
40e1360bb6
|
[CI/Build] Add text-only test for Qwen models (#7475)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-08-19 07:43:46 +08:00 |
|
SangBin Cho
|
4706eb628e
|
[aDAG] Unflake aDAG + PP tests (#7600)
|
2024-08-16 20:49:30 -07:00 |
|
Alexei-V-Ivanov-AMD
|
6bd19551b0
|
.[Build/CI] Enabling passing AMD tests. (#7610)
|
2024-08-16 20:25:32 -07:00 |
|
Michael Goin
|
44f26a9466
|
[Model] Align nemotron config with final HF state and fix lm-eval-small (#7611)
|
2024-08-16 15:56:34 -07:00 |
|
Mahesh Keralapura
|
93478b63d2
|
[Core] Fix tracking of model forward time in case of PP>1 (#7440)
[Core] Fix tracking of model forward time to the span traces in case of PP>1 (#7440)
|
2024-08-16 13:46:01 -07:00 |
|
Kuntai Du
|
6fc5b0f249
|
[CI] Fix crashes of performance benchmark (#7500)
|
2024-08-16 08:08:45 -07:00 |
|
youkaichao
|
54bd9a03c4
|
register custom op for flash attn and use from torch.ops (#7536)
|
2024-08-15 22:38:56 -07:00 |
|
nunjunj
|
3b19e39dc5
|
Chat method for offline llm (#5049)
Co-authored-by: nunjunj <ray@g-3ff9f30f2ed650001.c.vllm-405802.internal>
Co-authored-by: nunjunj <ray@g-1df6075697c3f0001.c.vllm-405802.internal>
Co-authored-by: nunjunj <ray@g-c5a2c23abc49e0001.c.vllm-405802.internal>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-08-15 19:41:34 -07:00 |
|
youkaichao
|
4cd7d47fed
|
[ci/test] rearrange tests and make adag test soft fail (#7572)
|
2024-08-15 19:39:04 -07:00 |
|
PHILO-HE
|
f4da5f7b6d
|
[Misc] Update dockerfile for CPU to cover protobuf installation (#7182)
|
2024-08-15 10:03:01 -07:00 |
|
youkaichao
|
d3d9cb6e4b
|
[ci] fix model tests (#7507)
|
2024-08-14 01:01:43 -07:00 |
|
Cyrus Leung
|
dd164d72f3
|
[Bugfix][Docs] Update list of mock imports (#7493)
|
2024-08-13 20:37:30 -07:00 |
|
youkaichao
|
ea49e6a3c8
|
[misc][ci] fix cpu test with plugins (#7489)
|
2024-08-13 19:27:46 -07:00 |
|
youkaichao
|
16422ea76f
|
[misc][plugin] add plugin system implementation (#7426)
|
2024-08-13 16:24:17 -07:00 |
|
Dipika Sikka
|
fb377d7e74
|
[Misc] Update gptq_marlin to use new vLLMParameters (#7281)
|
2024-08-13 14:30:11 -04:00 |
|
Dipika Sikka
|
181abbc27d
|
[Misc] Update LM Eval Tolerance (#7473)
|
2024-08-13 14:28:14 -04:00 |
|
Kevin H. Luu
|
65950e8f58
|
[ci] Entrypoints run upon changes in vllm/ (#7423)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-08-12 10:18:03 -07:00 |
|
Lily Liu
|
ec2affa8ae
|
[Kernel] Flashinfer correctness fix for v0.1.3 (#7319)
|
2024-08-12 07:59:17 +00:00 |
|
Kevin H. Luu
|
469b3bc538
|
[ci] Make building wheels per commit optional (#7278)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-08-07 11:34:25 -07:00 |
|
afeldman-nm
|
fd95e026e0
|
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-06 16:51:47 -04:00 |
|
Dipika Sikka
|
a3bbbfa1d8
|
[BugFix] Fix DeepSeek remote code (#7178)
|
2024-08-06 08:16:53 -07:00 |
|
Simon Mo
|
e3c664bfcb
|
[Build] Add initial conditional testing spec (#6841)
|
2024-08-05 17:39:22 -07:00 |
|
Kuntai Du
|
67d745cc68
|
[CI] Temporarily turn off H100 performance benchmark (#7104)
|
2024-08-02 23:52:44 -07:00 |
|
youkaichao
|
04e5583425
|
[ci][distributed] merge distributed test commands (#7097)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-02 21:33:53 -07:00 |
|
omkar kakarparthi
|
562e580abc
|
Update run-amd-test.sh (#7044)
|
2024-08-01 13:12:37 -07:00 |
|
Sage Moore
|
7e0861bd0b
|
[CI/Build] Update PyTorch to 2.4.0 (#6951)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-01 11:11:24 -07:00 |
|
Alexei-V-Ivanov-AMD
|
a72a424b3e
|
[Build/CI] Fixing Docker Hub quota issue. (#7043)
|
2024-08-01 11:07:37 -07:00 |
|
HandH1998
|
6512937de1
|
Support W4A8 quantization for vllm (#5218)
|
2024-07-31 07:55:21 -06:00 |
|
Cyrus Leung
|
f230cc2ca6
|
[Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836)
|
2024-07-31 10:38:45 +08:00 |
|
Cade Daniel
|
c32ab8be1a
|
[Speculative decoding] Add serving benchmark for llama3 70b + speculative decoding (#6964)
|
2024-07-31 00:53:21 +00:00 |
|
Cade Daniel
|
fb4f530bf5
|
[CI] [nightly benchmark] Do not re-download sharegpt dataset if exists (#6706)
|
2024-07-30 16:28:49 -07:00 |
|
Cade Daniel
|
79319cedfa
|
[Nightly benchmarking suite] Remove pkill python from run benchmark suite (#6965)
|
2024-07-30 16:28:05 -07:00 |
|
Simon Mo
|
40c27a7cbb
|
[Build] Temporarily Disable Kernels and LoRA tests (#6961)
|
2024-07-30 14:59:48 -07:00 |
|
Roger Wang
|
ecb33a28cb
|
[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860)
|
2024-07-27 09:54:14 +00:00 |
|
Joe
|
14dbd5a767
|
[Model] H2O Danube3-4b (#6451)
|
2024-07-26 20:47:50 -07:00 |
|
Sanger Steel
|
969d032265
|
[Bugfix]: Fix Tensorizer test failures (#6835)
|
2024-07-26 20:02:25 -07:00 |
|
Li, Jiang
|
3bbb4936dc
|
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125)
|
2024-07-26 13:50:10 -07:00 |
|
Michael Goin
|
07278c37dd
|
[Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611)
|
2024-07-26 14:33:42 -04:00 |
|
Kevin H. Luu
|
2eb9f4ff26
|
[ci] Mark tensorizer as soft fail and separate from grouped test (#6810)
[ci] Mark tensorizer test as soft fail and separate it from grouped test in fast check (#6810)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-07-25 18:08:33 -07:00 |
|
Kuntai Du
|
6a1e25b151
|
[Doc] Add documentations for nightly benchmarks (#6412)
|
2024-07-25 11:57:16 -07:00 |
|
Robert Shaw
|
889da130e7
|
[ Misc ] fp8-marlin channelwise via compressed-tensors (#6524)
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-07-25 09:46:04 -07:00 |
|
Alexei-V-Ivanov-AMD
|
b570811706
|
[Build/CI] Update run-amd-test.sh. Enable Docker Hub login. (#6711)
|
2024-07-24 05:01:14 -07:00 |
|
youkaichao
|
72fc704803
|
[build] relax wheel size limit (#6704)
|
2024-07-23 14:03:49 -07:00 |
|
Kevin H. Luu
|
69d5ae38dc
|
[ci] Use different sccache bucket for CUDA 11.8 wheel build (#6656)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-07-22 14:20:41 -07:00 |
|
Robert Shaw
|
9364f74eee
|
[ Kernel ] Enable fp8-marlin for fbgemm-fp8 models (#6606)
|
2024-07-20 18:50:10 +00:00 |
|
Matt Wong
|
06d6c5fe9f
|
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543)
|
2024-07-20 09:39:07 -07:00 |
|
Robert Shaw
|
683e3cb9c4
|
[ Misc ] fbgemm checkpoints (#6559)
|
2024-07-20 09:36:57 -07:00 |
|
Robert Shaw
|
4cc24f01b1
|
[ Kernel ] Enable Dynamic Per Token fp8 (#6547)
|
2024-07-19 23:08:15 +00:00 |
|
Robert Shaw
|
dbe5588554
|
[ Misc ] non-uniform quantization via compressed-tensors for Llama (#6515)
|
2024-07-18 22:39:18 -04:00 |
|
Nick Hill
|
b5672a112c
|
[Core] Multiprocessing Pipeline Parallel support (#6130)
Co-authored-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-18 19:15:52 -07:00 |
|
Tyler Michael Smith
|
1689219ebf
|
[CI/Build] Build on Ubuntu 20.04 instead of 22.04 (#6517)
|
2024-07-18 17:29:25 -07:00 |
|
youkaichao
|
f53b8f0d05
|
[ci][test] add correctness test for cpu offloading (#6549)
|
2024-07-18 23:41:06 +00:00 |
|
Rui Qiao
|
61e592747c
|
[Core] Introduce SPMD worker execution using Ray accelerated DAG (#6032)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Stephanie Wang <swang@cs.berkeley.edu>
|
2024-07-17 22:27:09 -07:00 |
|
youkaichao
|
1c27d25fb5
|
[core][model] yet another cpu offload implementation (#6496)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-07-17 20:54:35 -07:00 |
|
youkaichao
|
09c2eb85dd
|
[ci][distributed] add pipeline parallel correctness test (#6410)
|
2024-07-16 15:44:22 -07:00 |
|
Cyrus Leung
|
d97011512e
|
[CI/Build] vLLM cache directory for images (#6444)
|
2024-07-15 23:12:25 -07:00 |
|
Woosuk Kwon
|
4552e37b55
|
[CI/Build][TPU] Add TPU CI test (#6277)
Co-authored-by: kevin <kevin@anyscale.com>
|
2024-07-15 14:31:16 -07:00 |
|
youkaichao
|
69672f116c
|
[core][distributed] simplify code to support pipeline parallel (#6406)
|
2024-07-14 21:20:51 -07:00 |
|
Robert Shaw
|
a754dc2cb9
|
[CI/Build] Cross python wheel (#6394)
|
2024-07-14 18:54:46 -07:00 |
|
Robert Shaw
|
73030b7dae
|
[ Misc ] Enable Quantizing All Layers of DeekSeekv2 (#6423)
|
2024-07-14 21:38:42 +00:00 |
|
youkaichao
|
ccd3c04571
|
[ci][build] fix commit id (#6420)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-07-14 22:16:21 +08:00 |
|
Tyler Michael Smith
|
9dad5cc859
|
[Kernel] Turn off CUTLASS scaled_mm for Ada Lovelace (#6384)
|
2024-07-14 13:37:19 +00:00 |
|
Robert Shaw
|
fb6af8bc08
|
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417)
|
2024-07-13 20:03:58 -07:00 |
|
Robert Shaw
|
babf52dade
|
[ Misc ] More Cleanup of Marlin (#6359)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-07-13 10:21:37 +00:00 |
|
youkaichao
|
41708e5034
|
[ci] try to add multi-node tests (#6280)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-12 21:51:48 -07:00 |
|
Simon Mo
|
6bc9710f6e
|
Fix release pipeline's dir permission (#6391)
|
2024-07-12 15:52:43 -07:00 |
|
Simon Mo
|
21b2dcedab
|
Fix release pipeline's -e flag (#6390)
|
2024-07-12 14:08:04 -07:00 |
|
Simon Mo
|
07b35af86d
|
Fix interpolation in release pipeline (#6389)
|
2024-07-12 14:03:39 -07:00 |
|
Simon Mo
|
bb1a784b05
|
Fix release-pipeline.yaml (#6388)
|
2024-07-12 14:00:57 -07:00 |
|
Simon Mo
|
d719ba24c5
|
Build some nightly wheels by default (#6380)
|
2024-07-12 13:56:59 -07:00 |
|
Kevin H. Luu
|
b75bce1008
|
[ci] Add grouped tests & mark tests to run by default for fastcheck pipeline (#6365)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-07-12 09:58:38 -07:00 |
|
Alexei-V-Ivanov-AMD
|
f9d25c2519
|
[Build/CI] Checking/Waiting for the GPU's clean state (#6379)
|
2024-07-12 09:42:24 -07:00 |
|
Robert Shaw
|
aea19f0989
|
[ Misc ] Support Models With Bias in compressed-tensors integration (#6356)
|
2024-07-12 11:11:29 -04:00 |
|
adityagoel14
|
d26a8b3f1f
|
[CI/Build] (2/2) Switching AMD CI to store images in Docker Hub (#6350)
|
2024-07-11 21:26:26 -07:00 |
|
Lily Liu
|
d6ab528997
|
[Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351)
|
2024-07-12 01:32:06 +00:00 |
|
Robert Shaw
|
7ed6a4f0e1
|
[ BugFix ] Prompt Logprobs Detokenization (#6223)
Co-authored-by: Zifei Tong <zifeitong@gmail.com>
|
2024-07-11 22:02:29 +00:00 |
|
Kuntai Du
|
a4feba929b
|
[CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy (#5362)
|
2024-07-11 13:28:38 -07:00 |
|
Simon Mo
|
52b7fcb35a
|
Benchmark: add H100 suite (#6047)
|
2024-07-11 09:17:07 -07:00 |
|
Kevin H. Luu
|
a0550cbc80
|
Add support for multi-node on CI (#5955)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-07-09 12:56:56 -07:00 |
|
Robert Shaw
|
abfe705a02
|
[ Misc ] Support Fp8 via llm-compressor (#6110)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
|
2024-07-07 20:42:11 +00:00 |
|