Yuan Tang
acce7630c1
Update link to KServe deployment guide ( #9173 )
2024-10-09 03:58:49 +00:00
Michael Goin
9ba0bd6aa6
Add lm-eval directly to requirements-test.txt ( #9161 )
2024-10-08 18:22:31 -07:00
Rafael Vasquez
de24046fcd
[Doc] Improve contributing and installation documentation ( #9132 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-08 20:22:08 +00:00
Sayak Paul
1874c6a1b0
[Doc] Update vlm.rst to include an example on videos ( #9155 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-08 18:12:29 +00:00
TimWang
93cf74a8a7
[Doc]: Add deploying_with_k8s guide ( #8451 )
2024-10-07 13:31:45 -07:00
Cyrus Leung
151ef4efd2
[Model] Support NVLM-D and fix QK Norm in InternViT ( #9045 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2024-10-07 11:55:12 +00:00
Cyrus Leung
b22b798471
[Model] PP support for embedding models and update docs ( #9090 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-10-06 16:35:27 +08:00
Cyrus Leung
f22619fe96
[Misc] Remove user-facing error for removed VLM args ( #9104 )
2024-10-06 01:33:52 -07:00
Andy Dai
5df1834895
[Bugfix] Fix order of arguments matters in config.yaml ( #8960 )
2024-10-05 17:35:11 +00:00
Roger Wang
26aa325f4f
[Core][VLM] Test registration for OOT multimodal models ( #8717 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:38:25 -07:00
Cyrus Leung
0e36fd4909
[Misc] Move registry to its own file ( #9064 )
2024-10-04 10:01:37 +00:00
Murali Andoorveedu
0f6d7a9a34
[Models] Add remaining model PP support ( #7168 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:56:58 +08:00
代君
3dbb215b38
[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model ( #8405 )
2024-10-04 10:36:39 +08:00
Nick Hill
18c2e30c57
[Doc] Update Granite model docs ( #9025 )
2024-10-03 02:42:24 +00:00
Sergey Shlyapnikov
f58d4fccc9
[OpenVINO] Enable GPU support for OpenVINO vLLM backend ( #8192 )
2024-10-02 17:50:01 -04:00
Cyrus Leung
4f341bd4bf
[Doc] Update list of supported models ( #8987 )
2024-10-02 00:35:39 +08:00
whyiug
e01ab595d8
[Model] support input embeddings for qwen2vl ( #8856 )
2024-09-30 03:16:10 +00:00
youkaichao
cc276443b5
[doc] organize installation doc and expose per-commit docker ( #8931 )
2024-09-28 17:48:41 -07:00
youkaichao
d86f6b2afb
[misc] fix wheel name ( #8919 )
2024-09-27 22:10:44 -07:00
Cyrus Leung
3b00b9c26c
[Core] renamePromptInputs and inputs ( #8876 )
2024-09-26 20:35:15 -07:00
Maximilien de Bayser
344cd2b6f4
[Feature] Add support for Llama 3.1 and 3.2 tool use ( #8343 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-09-26 17:01:42 -07:00
youkaichao
70de39f6b4
[misc][installation] build from source without compilation ( #8818 )
2024-09-26 13:19:04 -07:00
Roger Wang
4bb98f2190
[Misc] Update config loading for Qwen2-VL and remove Granite ( #8837 )
2024-09-26 07:45:30 -07:00
Roger Wang
e2c6e0a829
[Doc] Update doc for Transformers 4.45 ( #8817 )
2024-09-25 13:29:48 -07:00
Chen Zhang
770ec6024f
[Model] Add support for the multi-modal Llama 3.2 model ( #8811 )
...
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-25 13:29:32 -07:00
Simon Mo
4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility ( #8760 ) ( #8810 )
2024-09-25 10:36:26 -07:00
Cyrus Leung
28e1299e60
rename PromptInputs and inputs with backward compatibility ( #8760 )
2024-09-25 09:36:47 -07:00
Hongxia Yang
1c046447a6
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade ( #8777 )
2024-09-25 22:26:37 +08:00
Jee Jee Li
13f9f7a3d0
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 ( #8768 )
2024-09-24 17:08:55 -07:00
Simon Mo
3185fb0cca
Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" ( #8750 )
2024-09-24 05:45:20 +00:00
Hongxia Yang
530821d00c
[Hardware][AMD] ROCm6.2 upgrade ( #8674 )
2024-09-23 18:52:39 -07:00
Daniele
ee5f34b1c2
[CI/Build] use setuptools-scm to set __version__ ( #4738 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-23 09:44:26 -07:00
Yan Ma
d23679eb99
[Bugfix] fix docker build for xpu ( #8652 )
2024-09-22 22:54:18 -07:00
youkaichao
d4a2ac8302
[build] enable existing pytorch (for GH200, aarch64, nightly) ( #8713 )
2024-09-22 12:47:54 -07:00
litianjian
5b59532760
[Model][VLM] Add LLaVA-Onevision model support ( #8486 )
...
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-22 10:51:44 -07:00
Andy Dai
4dfdf43196
[Doc] Fix typo in AMD installation guide ( #8689 )
2024-09-21 00:24:12 -07:00
Cyrus Leung
0057894ef7
[Core] Rename PromptInputs and inputs( #8673 )
2024-09-20 19:00:54 -07:00
omrishiv
7c8566aa4f
[Doc] neuron documentation update ( #8671 )
...
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-09-20 15:04:37 -07:00
Niklas Muennighoff
3b63de9353
[Model] Add OLMoE ( #7922 )
2024-09-20 09:31:41 -07:00
Jiaxin Shan
260d40b5ea
[Core] Support Lora lineage and base model metadata management ( #6315 )
2024-09-20 06:20:56 +00:00
Isotr0py
ea4647b7d7
[Doc] Add documentation for GGUF quantization ( #8618 )
2024-09-19 13:15:55 -06:00
Geun, Lim
e18749ff09
[Model] Support Solar Model ( #8386 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-18 11:04:00 -06:00
Alexander Matveev
7c7714d856
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH ( #8157 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
youkaichao
fa0c114fad
[doc] improve installation doc ( #8550 )
...
Co-authored-by: Andy Dai <76841985+Imss27@users.noreply.github.com>
2024-09-17 16:24:06 -07:00
youkaichao
2759a43a26
[doc] update doc on testing and debugging ( #8514 )
2024-09-16 12:10:23 -07:00
ywfang
8a0cf1ddc3
[Model] support minicpm3 ( #8297 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
Isotr0py
f57092c00b
[Doc] Add oneDNN installation to CPU backend documentation ( #8467 )
2024-09-13 18:06:30 +00:00
Cyrus Leung
a84e598e21
[CI/Build] Reorganize models tests ( #7820 )
2024-09-13 10:20:06 -07:00
youkaichao
cab69a15e4
[doc] recommend pip instead of conda ( #8446 )
2024-09-12 23:52:41 -07:00
Alex Brooks
c6202daeed
[Model] Support multiple images for qwen-vl ( #8247 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:54 -07:00
Patrick von Platen
d394787e52
Pixtral ( #8377 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-11 14:41:55 -07:00
Yang Fan
3b7fea770f
[Model][VLM] Add Qwen2-VL model support ( #7905 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-11 09:31:19 -07:00
Yangshen⚡Deng
6a512a00df
[model] Support for Llava-Next-Video model ( #7559 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-10 22:21:36 -07:00
Simon Mo
a1d874224d
Add NVIDIA Meetup slides, announce AMD meetup, and add contact info ( #8319 )
2024-09-09 23:21:00 -07:00
Isotr0py
e807125936
[Model][VLM] Support multi-images inputs for InternVL2 models ( #8201 )
2024-09-07 16:38:23 +08:00
Cyrus Leung
2f707fcb35
[Model] Multi-input support for LLaVA ( #8238 )
2024-09-07 02:57:24 +00:00
William Lin
12dd715807
[misc] [doc] [frontend] LLM torch profiler support ( #7943 )
2024-09-06 17:48:48 -07:00
Dipika Sikka
23f322297f
[Misc] Remove SqueezeLLM ( #8220 )
2024-09-06 16:29:03 -06:00
Jiaxin Shan
db3bf7c991
[Core] Support load and unload LoRA in api server ( #6566 )
...
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-09-05 18:10:33 -07:00
sroy745
2febcf2777
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM ( #7962 )
2024-09-05 16:25:29 -04:00
Alex Brooks
9da25a88aa
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) ( #8029 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-05 12:48:10 +00:00
Cyrus Leung
288a938872
[Doc] Indicate more information about supported modalities ( #8181 )
2024-09-05 10:51:53 +00:00
Kyle Mistele
e02ce498be
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models ( #5649 )
...
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
2024-09-04 13:18:13 -07:00
Woosuk Kwon
61f4a93d14
[TPU][Bugfix] Use XLA rank for persistent cache path ( #8137 )
2024-09-03 18:35:33 -07:00
Wenxiang
1248e8506a
[Model] Adding support for MSFT Phi-3.5-MoE ( #7729 )
...
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
2024-08-30 13:42:57 -06:00
Kaunil Dhruv
058344f89a
[Frontend]-config-cli-args ( #7737 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
2024-08-30 08:21:02 -07:00
Yohan Na
dc13e99348
[MODEL] add Exaone model support ( #7819 )
2024-08-29 23:34:20 -07:00
Stas Bekman
8c56e57def
[Doc] fix 404 link ( #7966 )
2024-08-28 13:54:23 -07:00
Woosuk Kwon
eeffde1ac0
[TPU] Upgrade PyTorch XLA nightly ( #7967 )
2024-08-28 13:10:21 -07:00
Stas Bekman
98c12cffe5
[Doc] fix the autoAWQ example ( #7937 )
2024-08-28 12:12:32 +00:00
Peter Salas
fab5f53e2d
[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt ( #7902 )
2024-08-28 01:53:56 +00:00
Peter Salas
57792ed469
[Doc] Fix incorrect docs from #7615 ( #7788 )
2024-08-22 10:02:06 -07:00
zifeitong
df1a21131d
[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue ( #7710 )
2024-08-22 09:36:24 +08:00
Peter Salas
1ca0d4f86b
[Model] Add UltravoxModel and UltravoxConfig ( #7615 )
2024-08-21 22:49:39 +00:00
William Lin
dd53c4b023
[misc] Add Torch profiler support ( #7451 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-21 15:39:26 -07:00
Roger Wang
4506641212
[Doc] Section for Multimodal Language Models ( #7719 )
2024-08-20 23:24:01 -07:00
Ilya Lavrenov
398521ad19
[OpenVINO] Updated documentation ( #7687 )
2024-08-20 07:33:56 -06:00
Michael Goin
d4f0f17b02
[Doc] Update quantization supported hardware table ( #7595 )
2024-08-16 13:59:27 -07:00
Michael Goin
b3f4e17935
[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints ( #7444 )
2024-08-16 13:59:16 -07:00
Kameshwara Pavan Kumar Mantha
22b39e11f2
llama_index serving integration documentation ( #6973 )
...
Co-authored-by: pavanmantha <pavan.mantha@thevaslabs.io>
2024-08-14 15:38:37 -07:00
Cyrus Leung
3f674a49b5
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt ( #7126 )
2024-08-14 17:55:42 +00:00
youkaichao
199adbb7cf
[doc] update test script to include cudagraph ( #7501 )
2024-08-13 21:52:58 -07:00
Cyrus Leung
dd164d72f3
[Bugfix][Docs] Update list of mock imports ( #7493 )
2024-08-13 20:37:30 -07:00
Woosuk Kwon
a08df8322e
[TPU] Support multi-host inference ( #7457 )
2024-08-13 16:31:20 -07:00
Peter Salas
00c3d68e45
[Frontend][Core] Add plumbing to support audio language models ( #7446 )
2024-08-13 17:39:33 +00:00
Woosuk Kwon
e20233d361
Revert "[Doc] Update supported_hardware.rst ( #7276 )" ( #7467 )
2024-08-13 01:37:08 -07:00
jon-chuang
a046f86397
[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel ( #7208 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-12 22:47:41 +00:00
Roger Wang
e6e42e4b17
[Core][VLM] Support image embeddings as input ( #6613 )
2024-08-12 16:16:06 +08:00
Simon Mo
f020a6297e
[Docs] Update readme ( #7316 )
2024-08-11 17:13:37 -07:00
tomeras91
02b1988b9f
[Doc] building vLLM with VLLM_TARGET_DEVICE=empty ( #7403 )
2024-08-11 14:38:17 -07:00
Woosuk Kwon
90bab18f24
[TPU] Use mark_dynamic to reduce compilation time ( #7340 )
2024-08-10 18:12:22 -07:00
Simon Mo
5923532e15
Add Skywork AI as Sponsor ( #7314 )
2024-08-08 13:59:57 -07:00
Jee Jee Li
757ac70a64
[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 ( #7273 )
2024-08-08 14:02:41 +00:00
Michael Goin
6d94420246
[Doc] Update supported_hardware.rst ( #7276 )
2024-08-07 14:21:50 -07:00
Stas Bekman
0e12cd67a8
[Doc] add online speculative decoding example ( #7243 )
2024-08-07 09:58:02 -07:00
Ilya Lavrenov
80cbe10c59
[OpenVINO] migrate to latest dependencies versions ( #7251 )
2024-08-07 09:49:10 -07:00
Roger Wang
2385c8f374
[Doc] Mock new dependencies for documentation ( #7245 )
2024-08-07 06:43:03 +00:00
Thomas Parnell
789937af2e
[Doc] [SpecDecode] Update MLPSpeculator documentation ( #7100 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-08-05 23:29:43 +00:00
Simon Mo
4db5176d97
bump version to v0.5.4 ( #7139 )
2024-08-05 14:39:48 -07:00
Jee Jee Li
179a6a36f2
[Model]Refactor MiniCPMV ( #7020 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 08:12:41 +00:00