ywfang
8a0cf1ddc3
[Model] support minicpm3 ( #8297 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
Isotr0py
f57092c00b
[Doc] Add oneDNN installation to CPU backend documentation ( #8467 )
2024-09-13 18:06:30 +00:00
Cyrus Leung
a84e598e21
[CI/Build] Reorganize models tests ( #7820 )
2024-09-13 10:20:06 -07:00
youkaichao
cab69a15e4
[doc] recommend pip instead of conda ( #8446 )
2024-09-12 23:52:41 -07:00
Alex Brooks
c6202daeed
[Model] Support multiple images for qwen-vl ( #8247 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:54 -07:00
Patrick von Platen
d394787e52
Pixtral ( #8377 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-11 14:41:55 -07:00
Yang Fan
3b7fea770f
[Model][VLM] Add Qwen2-VL model support ( #7905 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-11 09:31:19 -07:00
Yangshen⚡Deng
6a512a00df
[model] Support for Llava-Next-Video model ( #7559 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-10 22:21:36 -07:00
Simon Mo
a1d874224d
Add NVIDIA Meetup slides, announce AMD meetup, and add contact info ( #8319 )
2024-09-09 23:21:00 -07:00
Isotr0py
e807125936
[Model][VLM] Support multi-images inputs for InternVL2 models ( #8201 )
2024-09-07 16:38:23 +08:00
Cyrus Leung
2f707fcb35
[Model] Multi-input support for LLaVA ( #8238 )
2024-09-07 02:57:24 +00:00
William Lin
12dd715807
[misc] [doc] [frontend] LLM torch profiler support ( #7943 )
2024-09-06 17:48:48 -07:00
Dipika Sikka
23f322297f
[Misc] Remove SqueezeLLM ( #8220 )
2024-09-06 16:29:03 -06:00
Jiaxin Shan
db3bf7c991
[Core] Support load and unload LoRA in api server ( #6566 )
...
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-09-05 18:10:33 -07:00
sroy745
2febcf2777
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM ( #7962 )
2024-09-05 16:25:29 -04:00
Alex Brooks
9da25a88aa
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) ( #8029 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-05 12:48:10 +00:00
Cyrus Leung
288a938872
[Doc] Indicate more information about supported modalities ( #8181 )
2024-09-05 10:51:53 +00:00
Kyle Mistele
e02ce498be
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models ( #5649 )
...
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
2024-09-04 13:18:13 -07:00
Woosuk Kwon
61f4a93d14
[TPU][Bugfix] Use XLA rank for persistent cache path ( #8137 )
2024-09-03 18:35:33 -07:00
Wenxiang
1248e8506a
[Model] Adding support for MSFT Phi-3.5-MoE ( #7729 )
...
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
2024-08-30 13:42:57 -06:00
Kaunil Dhruv
058344f89a
[Frontend]-config-cli-args ( #7737 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
2024-08-30 08:21:02 -07:00
Yohan Na
dc13e99348
[MODEL] add Exaone model support ( #7819 )
2024-08-29 23:34:20 -07:00
Stas Bekman
8c56e57def
[Doc] fix 404 link ( #7966 )
2024-08-28 13:54:23 -07:00
Woosuk Kwon
eeffde1ac0
[TPU] Upgrade PyTorch XLA nightly ( #7967 )
2024-08-28 13:10:21 -07:00
Stas Bekman
98c12cffe5
[Doc] fix the autoAWQ example ( #7937 )
2024-08-28 12:12:32 +00:00
Peter Salas
fab5f53e2d
[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt ( #7902 )
2024-08-28 01:53:56 +00:00
Peter Salas
57792ed469
[Doc] Fix incorrect docs from #7615 ( #7788 )
2024-08-22 10:02:06 -07:00
zifeitong
df1a21131d
[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue ( #7710 )
2024-08-22 09:36:24 +08:00
Peter Salas
1ca0d4f86b
[Model] Add UltravoxModel and UltravoxConfig ( #7615 )
2024-08-21 22:49:39 +00:00
William Lin
dd53c4b023
[misc] Add Torch profiler support ( #7451 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-21 15:39:26 -07:00
Roger Wang
4506641212
[Doc] Section for Multimodal Language Models ( #7719 )
2024-08-20 23:24:01 -07:00
Ilya Lavrenov
398521ad19
[OpenVINO] Updated documentation ( #7687 )
2024-08-20 07:33:56 -06:00
Michael Goin
d4f0f17b02
[Doc] Update quantization supported hardware table ( #7595 )
2024-08-16 13:59:27 -07:00
Michael Goin
b3f4e17935
[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints ( #7444 )
2024-08-16 13:59:16 -07:00
Kameshwara Pavan Kumar Mantha
22b39e11f2
llama_index serving integration documentation ( #6973 )
...
Co-authored-by: pavanmantha <pavan.mantha@thevaslabs.io>
2024-08-14 15:38:37 -07:00
Cyrus Leung
3f674a49b5
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt ( #7126 )
2024-08-14 17:55:42 +00:00
youkaichao
199adbb7cf
[doc] update test script to include cudagraph ( #7501 )
2024-08-13 21:52:58 -07:00
Cyrus Leung
dd164d72f3
[Bugfix][Docs] Update list of mock imports ( #7493 )
2024-08-13 20:37:30 -07:00
Woosuk Kwon
a08df8322e
[TPU] Support multi-host inference ( #7457 )
2024-08-13 16:31:20 -07:00
Peter Salas
00c3d68e45
[Frontend][Core] Add plumbing to support audio language models ( #7446 )
2024-08-13 17:39:33 +00:00
Woosuk Kwon
e20233d361
Revert "[Doc] Update supported_hardware.rst ( #7276 )" ( #7467 )
2024-08-13 01:37:08 -07:00
jon-chuang
a046f86397
[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel ( #7208 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-12 22:47:41 +00:00
Roger Wang
e6e42e4b17
[Core][VLM] Support image embeddings as input ( #6613 )
2024-08-12 16:16:06 +08:00
Simon Mo
f020a6297e
[Docs] Update readme ( #7316 )
2024-08-11 17:13:37 -07:00
tomeras91
02b1988b9f
[Doc] building vLLM with VLLM_TARGET_DEVICE=empty ( #7403 )
2024-08-11 14:38:17 -07:00
Woosuk Kwon
90bab18f24
[TPU] Use mark_dynamic to reduce compilation time ( #7340 )
2024-08-10 18:12:22 -07:00
Simon Mo
5923532e15
Add Skywork AI as Sponsor ( #7314 )
2024-08-08 13:59:57 -07:00
Jee Jee Li
757ac70a64
[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 ( #7273 )
2024-08-08 14:02:41 +00:00
Michael Goin
6d94420246
[Doc] Update supported_hardware.rst ( #7276 )
2024-08-07 14:21:50 -07:00
Stas Bekman
0e12cd67a8
[Doc] add online speculative decoding example ( #7243 )
2024-08-07 09:58:02 -07:00
Ilya Lavrenov
80cbe10c59
[OpenVINO] migrate to latest dependencies versions ( #7251 )
2024-08-07 09:49:10 -07:00
Roger Wang
2385c8f374
[Doc] Mock new dependencies for documentation ( #7245 )
2024-08-07 06:43:03 +00:00
Thomas Parnell
789937af2e
[Doc] [SpecDecode] Update MLPSpeculator documentation ( #7100 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-08-05 23:29:43 +00:00
Simon Mo
4db5176d97
bump version to v0.5.4 ( #7139 )
2024-08-05 14:39:48 -07:00
Jee Jee Li
179a6a36f2
[Model]Refactor MiniCPMV ( #7020 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 08:12:41 +00:00
Yihuan Bu
654bc5ca49
Support for guided decoding for offline LLM ( #6878 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 03:12:09 +00:00
Michael Goin
b482b9a5b1
[CI/Build] Add support for Python 3.12 ( #7035 )
2024-08-02 13:51:22 -07:00
Murali Andoorveedu
fc912e0886
[Models] Support Qwen model with PP ( #6974 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-08-01 12:40:43 -07:00
Jee Jee Li
7ecee34321
[Kernel][RFC] Refactor the punica kernel based on Triton ( #5036 )
2024-07-31 17:12:24 -07:00
Alphi
2f4e108f75
[Bugfix] Clean up MiniCPM-V ( #6939 )
...
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-31 14:39:19 +00:00
Cyrus Leung
f230cc2ca6
[Bugfix] Fix broadcasting logic for multi_modal_kwargs ( #6836 )
2024-07-31 10:38:45 +08:00
Ilya Lavrenov
5895b24677
[OpenVINO] Updated OpenVINO requirements and build docs ( #6948 )
2024-07-30 11:33:01 -07:00
Isotr0py
7cbd9ec7a9
[Model] Initialize support for InternVL2 series models ( #6514 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-29 10:16:30 +00:00
Woosuk Kwon
fad5576c58
[TPU] Reduce compilation time & Upgrade PyTorch XLA version ( #6856 )
2024-07-27 10:28:33 -07:00
Chenggang Wu
f954d0715c
[Docs] Add RunLLM chat widget ( #6857 )
2024-07-27 09:24:46 -07:00
Cyrus Leung
1ad86acf17
[Model] Initial support for BLIP-2 ( #5920 )
...
Co-authored-by: ywang96 <ywang@roblox.com>
2024-07-27 11:53:07 +00:00
Roger Wang
ecb33a28cb
[CI/Build][Doc] Update CI and Doc for VLM example changes ( #6860 )
2024-07-27 09:54:14 +00:00
Harry Mellor
c53041ae3b
[Doc] Add missing mock import to docs conf.py ( #6834 )
2024-07-27 04:47:33 +00:00
omrishiv
3c3012398e
[Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron ( #6844 )
...
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-07-26 20:20:16 -07:00
Woosuk Kwon
ced36cd89b
[ROCm] Upgrade PyTorch nightly version ( #6845 )
2024-07-26 20:16:13 -07:00
Zhanghao Wu
150a1ffbfd
[Doc] Update SkyPilot doc for wrong indents and instructions for update service ( #4283 )
2024-07-26 14:39:10 -07:00
Michael Goin
281977bd6e
[Doc] Add Nemotron to supported model docs ( #6843 )
2024-07-26 17:32:44 -04:00
Li, Jiang
3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation ( #6125 )
2024-07-26 13:50:10 -07:00
youkaichao
85ad7e2d01
[doc][debugging] add known issues for hangs ( #6816 )
2024-07-25 21:48:05 -07:00
Woosuk Kwon
b7215de2c5
[Docs] Publish 5th meetup slides ( #6799 )
2024-07-25 16:47:55 -07:00
youkaichao
f3ff63c3f4
[doc][distributed] improve multinode serving doc ( #6804 )
2024-07-25 15:38:32 -07:00
Kuntai Du
6a1e25b151
[Doc] Add documentations for nightly benchmarks ( #6412 )
2024-07-25 11:57:16 -07:00
Alphi
9e169a4c61
[Model] Adding support for MiniCPM-V ( #4087 )
2024-07-24 20:59:30 -07:00
Hongxia Yang
d88c458f44
[Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users ( #6754 )
2024-07-24 14:32:57 -07:00
Woosuk Kwon
ccc4a73257
[Docs][ROCm] Detailed instructions to build from source ( #6680 )
2024-07-24 01:07:23 -07:00
dongmao zhang
87525fab92
[bitsandbytes]: support read bnb pre-quantized model ( #5753 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-23 23:45:09 +00:00
youkaichao
71950af726
[doc][distributed] fix doc argument order ( #6691 )
2024-07-23 08:55:33 -07:00
Woosuk Kwon
cb1362a889
[Docs] Announce llama3.1 support ( #6688 )
2024-07-23 08:18:15 -07:00
Roger Wang
22fa2e35cb
[VLM][Model] Support image input for Chameleon ( #6633 )
2024-07-22 23:50:48 -07:00
youkaichao
c051bfe4eb
[doc][distributed] doc for setting up multi-node environment ( #6529 )
...
[doc][distributed] add more doc for setting up multi-node environment (#6529 )
2024-07-22 21:22:09 -07:00
Cyrus Leung
739b61a348
[Frontend] Refactor prompt processing ( #4028 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-22 10:13:53 -07:00
Matt Wong
06d6c5fe9f
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes ( #6543 )
2024-07-20 09:39:07 -07:00
Murali Andoorveedu
45ceb85a0c
[Docs] Update PP docs ( #6598 )
2024-07-19 16:38:21 -07:00
Simon Mo
30efe41532
[Docs] Update docs for wheel location ( #6580 )
2024-07-19 12:14:11 -07:00
milo157
a38524f338
[DOC] - Add docker image to Cerebrium Integration ( #6510 )
2024-07-17 10:22:53 -07:00
Cyrus Leung
5bf35a91e4
[Doc][CI/Build] Update docs and tests to use vllm serve ( #6431 )
2024-07-17 07:43:21 +00:00
Hongxia Yang
10383887e0
[ROCm] Cleanup Dockerfile and remove outdated patch ( #6482 )
2024-07-16 22:47:02 -07:00
Jiaxin Shan
94162beb9f
[Doc] Fix the lora adapter path in server startup script ( #6230 )
2024-07-16 10:11:04 -07:00
Woosuk Kwon
c467dff24f
[Hardware][TPU] Support MoE with Pallas GMM kernel ( #6457 )
2024-07-16 09:56:28 -07:00
youkaichao
9f4ccec761
[doc][misc] remind to cancel debugging environment variables ( #6481 )
...
[doc][misc] remind users to cancel debugging environment variables after debugging (#6481 )
2024-07-16 09:45:30 -07:00
Woosuk Kwon
3dee97b05f
[Docs] Add Google Cloud to sponsor list ( #6450 )
2024-07-15 11:58:10 -07:00
youkaichao
94b82e8c18
[doc][distributed] add suggestion for distributed inference ( #6418 )
2024-07-15 09:45:51 -07:00
youkaichao
22e79ee8f3
[doc][misc] doc update ( #6439 )
2024-07-14 23:33:25 -07:00
Robert Cohn
61e85dbad8
[Doc] xpu backend requires running setvars.sh ( #6393 )
2024-07-14 17:10:11 -07:00
Ethan Xu
dbfe254eda
[Feature] vLLM CLI ( #5090 )
...
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-07-14 15:36:43 -07:00