bnellnm
f192aeba74
[Bugfix] Enable some fp8 and quantized fullgraph tests ( #10171 )
...
Signed-off-by: Bill Nell <bill@neuralmagic.com>
2024-11-09 08:01:27 +00:00
Isotr0py
47672f38b5
[CI/Build] Fix VLM broadcast tests tensor_parallel_size passing ( #10161 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-09 04:02:59 +00:00
Cyrus Leung
e0191a95d8
[0/N] Rename MultiModalInputs to MultiModalKwargs ( #10040 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 11:31:02 +08:00
rasmith
127c07480e
[Kernel][Triton] Add Triton implementation for scaled_mm_triton to support fp8 and int8 SmoothQuant, symmetric case ( #9857 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2024-11-08 19:59:22 -05:00
Luka Govedič
4f93dfe952
[torch.compile] Fuse RMSNorm with quant ( #9138 )
...
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: youkaichao <youkaichao@126.com>
2024-11-08 21:20:08 +00:00
Florian Zimmermeister
e1b5a82179
Rename vllm.logging to vllm.logging_utils ( #10134 )
2024-11-08 20:53:24 +00:00
sroy745
f6778620a9
Disable spec-decode + chunked-prefill for draft models with tensor parallelism > 1 ( #10136 )
...
Signed-off-by: Sourashis Roy <sroy@roblox.com>
2024-11-08 15:56:18 +00:00
Cyrus Leung
b489fc3c91
[CI/Build] Update CPU tests to include all "standard" tests ( #5481 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-08 23:30:04 +08:00
Isotr0py
1ff4aed5bd
[Model] Expose size to Idefics3 as mm_processor_kwargs ( #10146 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-08 09:56:58 +00:00
Cody Yu
201fc07730
[V1] Prefix caching (take 2) ( #9972 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2024-11-07 17:34:44 -08:00
litianjian
28b2877d30
Online video support for VLMs ( #10020 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-07 20:25:59 +00:00
Russell Bryant
3be5b26a76
[CI/Build] Add shell script linting using shellcheck ( #7925 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-07 18:17:29 +00:00
Nicolò Lucchesi
9d43afcc53
[Feature] [Spec decode]: Combine chunked prefill with speculative decoding ( #9291 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2024-11-07 08:15:14 -08:00
Maximilien de Bayser
ae62fd17c0
[Frontend] Tool calling parser for Granite 3.0 models ( #9027 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-11-07 07:09:02 -08:00
Flávia Béo
aa9078fa03
Adds method to read the pooling types from model's files ( #9506 )
...
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2024-11-07 08:42:40 +00:00
Hanzhi Zhou
6192e9b8fe
[Core][Distributed] Refactor ipc buffer init in CustomAllreduce ( #10030 )
...
Signed-off-by: Hanzhi Zhou <hanzhi713@gmail.com>
2024-11-06 23:50:47 -08:00
Cyrus Leung
db7db4aab9
[Misc] Consolidate ModelConfig code related to HF config ( #10104 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-07 06:00:21 +00:00
Li, Jiang
a4b3e0c1e9
[Hardware][CPU] Update torch 2.5 ( #9911 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2024-11-07 04:43:08 +00:00
youkaichao
719c1ca468
[core][distributed] add stateless_init_process_group ( #10072 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-06 16:42:09 -08:00
Joe Runde
d58268c56a
[V1] Make v1 more testable ( #9888 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-11-06 11:57:35 -08:00
Jee Jee Li
a5bba7d234
[Model] Add Idefics3 support ( #9767 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: B-201 <Joy25810@foxmail.com>
Co-authored-by: B-201 <Joy25810@foxmail.com>
2024-11-06 11:41:17 +00:00
Isotr0py
a5fda50a10
[CI/Build] Fix large_gpu_mark reason ( #10070 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-11-06 08:50:37 +00:00
Aaron Pham
21063c11c7
[CI/Build] drop support for Python 3.8 EOL ( #8464 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-11-06 07:11:55 +00:00
youkaichao
4be3a45158
[distributed] add function to create ipc buffers directly ( #10064 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-05 22:35:03 -08:00
Travis Johnson
2bcbae704c
[Bugfix] Fix edge-case crash when using chat with the Mistral Tekken Tokenizer ( #10051 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-11-06 04:28:29 +00:00
Sungjae Lee
0c63c34f72
[Bugfix][SpecDecode] kv corruption with bonus tokens in spec decode ( #9730 )
...
Co-authored-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
2024-11-06 01:45:45 +00:00
Wallas Henrique
966e31697b
[Bugfix] Fix pickle of input when async output processing is on ( #9931 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-11-06 00:39:26 +00:00
youkaichao
ca9844b340
[bugfix] fix weak ref in piecewise cudagraph and tractable test ( #10048 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-05 14:49:20 -08:00
Michael Goin
235366fe2e
[CI] Prune back the number of tests in tests/kernels/* ( #9932 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-05 16:02:32 -05:00
Michael Goin
02462465ea
[CI] Prune tests/models/decoder_only/language/* tests ( #9940 )
...
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-05 16:02:23 -05:00
Cyrus Leung
bbc3619dc8
[Core] Make encoder-decoder inputs a nested structure to be more composable ( #9604 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-05 10:07:31 +08:00
tomeras91
ac04a97a9f
[Frontend] Add max_tokens prometheus metric ( #9881 )
...
Signed-off-by: Tomer Asida <tomera@ai21.com>
2024-11-04 22:53:24 +00:00
hissu-hyvarinen
5208dc7a20
[Bugfix][CI/Build][Hardware][AMD] Shard ID parameters in AMD tests running parallel jobs ( #9279 )
...
Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>
2024-11-04 11:37:46 -08:00
Robert Shaw
1c45f4c385
[CI] Basic Integration Test For TPU ( #9968 )
...
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
2024-11-04 11:34:26 -08:00
Chauncey
ac6b8f19b9
[Frontend] Multi-Modality Support for Loading Local Image Files ( #9915 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2024-11-04 15:34:57 +00:00
shanshan wang
54597724f4
[Model] Add support for H2OVL-Mississippi models ( #9747 )
...
Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-11-04 00:15:36 +00:00
youkaichao
cea808f325
[3/N] model runner pass the whole config to model ( #9958 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-02 12:08:49 -07:00
youkaichao
e893795443
[2/N] executor pass the complete config to worker/modelrunner ( #9938 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2024-11-02 07:35:05 -07:00
sroy745
a78dd3303e
[Encoder Decoder] Add flash_attn kernel support for encoder-decoder models ( #9559 )
2024-11-01 23:22:49 -07:00
Peter Salas
6c0b7f548d
[Core][VLM] Add precise multi-modal placeholder tracking ( #8346 )
...
Signed-off-by: Peter Salas <peter@fixie.ai>
2024-11-01 16:21:10 -07:00
Pavani Majety
598b6d7b07
[Bugfix/Core] Flashinfer k_scale and v_scale ( #9861 )
2024-11-01 12:15:05 -07:00
Travis Johnson
1dd4cb2935
[Bugfix] Fix edge cases for MistralTokenizer ( #9625 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-11-01 10:33:15 -07:00
Cyrus Leung
ba0d892074
[Frontend] Use a proper chat template for VLM2Vec ( #9912 )
2024-11-01 14:09:07 +00:00
Michael Goin
30a2e80742
[CI/Build] Add Model Tests for PixtralHF ( #9813 )
2024-11-01 07:55:29 -06:00
Cyrus Leung
06386a64dd
[Frontend] Chat-based Embeddings API ( #9759 )
2024-11-01 08:13:35 +00:00
Yongzao
2b5bf20988
[torch.compile] Adding torch compile annotations to some models ( #9876 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-01 00:25:47 -07:00
youkaichao
566cd27797
[torch.compile] rework test plans ( #9866 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-31 22:20:17 -07:00
youkaichao
96e0c9cbbd
[torch.compile] directly register custom op ( #9896 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-31 21:56:09 -07:00
Joe Runde
031a7995f3
[Bugfix][Frontend] Reject guided decoding in multistep mode ( #9892 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-11-01 01:09:46 +00:00
Mor Zusman
9fb12f7848
[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 ( #9838 )
...
Signed-off-by: mzusman <mor.zusmann@gmail.com>
2024-10-31 20:06:25 +00:00
sasha0552
55650c83a0
[Bugfix] Fix illegal memory access error with chunked prefill, prefix caching, block manager v2 and xformers enabled together ( #9532 )
...
Signed-off-by: sasha0552 <admin@sasha0552.org>
2024-10-31 11:46:36 -07:00
Alex Brooks
16b8f7a86f
[CI/Build] Add Model Tests for Qwen2-VL ( #9846 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-31 09:10:52 -07:00
Guillaume Calmettes
abbfb6134d
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint ( #9837 )
2024-10-30 18:15:56 -07:00
youkaichao
64384bbcdf
[torch.compile] upgrade tests ( #9858 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-30 16:34:22 -07:00
Yongzao
00d91c8a2c
[CI/Build] Simplify exception trace in api server tests ( #9787 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-30 14:52:05 -07:00
Joe Runde
3b3f1e7436
[Bugfix][core] replace heartbeat with pid check ( #9818 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-30 09:34:07 -07:00
Elfie Guo
9ff4511e43
[Misc] Add chunked-prefill support on FlashInfer. ( #9781 )
2024-10-30 09:33:53 -07:00
Alex Brooks
cc98f1e079
[CI/Build] VLM Test Consolidation ( #9372 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-30 09:32:17 -07:00
youkaichao
ff5ed6e1bc
[torch.compile] rework compile control with piecewise cudagraph ( #9715 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-29 23:03:49 -07:00
Will Eaton
882a1ad0de
[Model] tool calling support for ibm-granite/granite-20b-functioncalling ( #8339 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
2024-10-29 15:07:37 -07:00
Joe Runde
67bdf8e523
[Bugfix][Frontend] Guard against bad token ids ( #9634 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-29 14:13:20 -07:00
Michael Goin
ab6f981671
[CI][Bugfix] Skip chameleon for transformers 4.46.1 ( #9808 )
2024-10-29 11:12:43 -07:00
wangshuai09
622b7ab955
[Hardware] using current_platform.seed_everything ( #9785 )
...
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-29 14:47:44 +00:00
Zhong Qishuai
ef7865b4f9
[Frontend] re-enable multi-modality input in the new beam search implementation ( #9427 )
...
Signed-off-by: Qishuai Ferdinandzhong@gmail.com
2024-10-29 11:49:47 +00:00
litianjian
5f8d8075f9
[Model][VLM] Add multi-video support for LLaVA-Onevision ( #8905 )
...
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-28 18:04:10 +00:00
youkaichao
32176fee73
[torch.compile] support moe models ( #9632 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-27 21:58:04 -07:00
wangshuai09
4e2d95e372
[Hardware][ROCM] using current_platform.is_rocm ( #9642 )
...
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-28 04:07:00 +00:00
madt2709
34a9941620
[Bugfix] Fix load config when using bools ( #9533 )
2024-10-27 13:46:41 -04:00
bnellnm
3cb07a36a2
[Misc] Upgrade to pytorch 2.5 ( #9588 )
...
Signed-off-by: Bill Nell <bill@neuralmagic.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-27 09:44:24 +00:00
kakao-kevin-us
6650e6a930
[Model] Add classification Task with Qwen2ForSequenceClassification ( #9704 )
...
Signed-off-by: Kevin-Yang <ykcha9@gmail.com>
Co-authored-by: Kevin-Yang <ykcha9@gmail.com>
2024-10-26 17:53:35 +00:00
Vasiliy Alekseev
07e981fdf4
[Frontend] Bad words sampling parameter ( #9717 )
...
Signed-off-by: Vasily Alexeev <alvasian@yandex.ru>
2024-10-26 16:29:38 +00:00
Mengqing Cao
5cbdccd151
[Hardware][openvino] is_openvino --> current_platform.is_openvino ( #9716 )
2024-10-26 10:59:06 +00:00
Kevin H. Luu
9f7b4ba865
[ci/Build] Skip Chameleon for transformers 4.46.0 on broadcast test #9675 ( #9676 )
2024-10-24 20:59:00 -07:00
Charlie Fu
59449095ab
[Performance][Kernel] Fused_moe Performance Improvement ( #9384 )
...
Signed-off-by: charlifu <charlifu@amd.com>
2024-10-24 15:37:52 -07:00
Alex Brooks
722d46edb9
[Model] Compute Llava Next Max Tokens / Dummy Data From Gridpoints ( #9650 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-24 10:42:24 -07:00
Cyrus Leung
c866e0079d
[CI/Build] Fix VLM test failures when using transformers v4.46 ( #9666 )
2024-10-25 01:40:40 +08:00
Yongzao
d27cfbf791
[torch.compile] Adding torch compile annotations to some models ( #9641 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-24 09:31:42 -07:00
Jee Jee Li
295a061fb3
[Kernel] add kernel for FATReLU ( #9610 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-10-24 16:18:27 +08:00
Yongzao
8a02cd045a
[torch.compile] Adding torch compile annotations to some models ( #9639 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-24 00:54:57 -07:00
youkaichao
4fdc581f9e
[core] simplify seq group code ( #9569 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-10-24 00:16:44 -07:00
Cyrus Leung
836e8ef6ee
[Bugfix] Fix PP for ChatGLM and Molmo ( #9422 )
2024-10-24 06:12:05 +00:00
Vinay R Damodaran
33bab41060
[Bugfix]: Make chat content text allow type content ( #9358 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
2024-10-24 05:05:49 +00:00
Yunfei Chu
fc6c274626
[Model] Add Qwen2-Audio model support ( #9248 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-23 17:54:22 +00:00
Alex Brooks
150b779081
[Frontend] Enable Online Multi-image Support for MLlama ( #9393 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-23 17:28:57 +00:00
Alex Brooks
31a08f5bd2
[Model] Add min_pixels / max_pixels to Qwen2VL as mm_processor_kwargs ( #9612 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-23 14:05:18 +00:00
Isotr0py
3ff57ebfca
[Model] Initialize Florence-2 language backbone support ( #9555 )
2024-10-23 10:42:47 +00:00
Cyrus Leung
831540cf04
[Model] Support E5-V ( #9576 )
2024-10-23 11:35:29 +08:00
yulei
b17046e298
[BugFix] Fix metrics error for --num-scheduler-steps > 1 ( #8234 )
2024-10-22 15:43:03 -07:00
Ronen Schaffer
cd5601ac37
[BugFix] Prevent exporting duplicate OpenTelemetry spans ( #9017 )
2024-10-22 11:11:53 -07:00
Isotr0py
bb392ea2d2
[Model][VLM] Initialize support for Mono-InternVL model ( #9528 )
2024-10-22 16:01:46 +00:00
Jee Jee Li
a48e3ec052
[CI/Build][LoRA] Temporarily fix long context failure issue ( #9579 )
2024-10-22 11:32:51 +00:00
wangshuai09
3ddbe25502
[Hardware][CPU] using current_platform.is_cpu ( #9536 )
2024-10-22 00:50:43 -07:00
Wallas Henrique
c0292211ce
[CI/Build] Replaced some models on tests for smaller ones ( #9570 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-10-22 04:52:14 +00:00
Cyrus Leung
f085995a7b
[CI/Build] Remove unnecessary fork_new_process ( #9484 )
2024-10-21 19:47:29 -07:00
Travis Johnson
b729901139
[Bugfix]: serialize config by value for --trust-remote-code ( #6751 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-21 19:46:24 -07:00
youkaichao
76a5e13270
[core] move parallel sampling out from vllm core ( #9302 )
2024-10-22 00:31:44 +00:00
Joe Runde
ef7faad1b8
🐛 Fixup more test failures from memory profiling ( #9563 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-21 17:10:56 -07:00
Wallas Henrique
711f3a7806
[Frontend] Don't log duplicate error stacktrace for every request in the batch ( #9023 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-10-21 14:49:41 -07:00
Dhia Eddine Rhaiem
f6b97293aa
[Model] FalconMamba Support ( #9325 )
2024-10-21 12:50:16 -04:00
Cyrus Leung
696b01af8f
[CI/Build] Split up decoder-only LM tests ( #9488 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-20 21:27:50 -07:00