Jiaxin Shan
|
db3bf7c991
|
[Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-09-05 18:10:33 -07:00 |
|
sroy745
|
2febcf2777
|
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (#7962)
|
2024-09-05 16:25:29 -04:00 |
|
Alex Brooks
|
9da25a88aa
|
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-05 12:48:10 +00:00 |
|
Cyrus Leung
|
288a938872
|
[Doc] Indicate more information about supported modalities (#8181)
|
2024-09-05 10:51:53 +00:00 |
|
Kyle Mistele
|
e02ce498be
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-04 13:18:13 -07:00 |
|
Woosuk Kwon
|
61f4a93d14
|
[TPU][Bugfix] Use XLA rank for persistent cache path (#8137)
|
2024-09-03 18:35:33 -07:00 |
|
Wenxiang
|
1248e8506a
|
[Model] Adding support for MSFT Phi-3.5-MoE (#7729)
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
|
2024-08-30 13:42:57 -06:00 |
|
Kaunil Dhruv
|
058344f89a
|
[Frontend]-config-cli-args (#7737)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
|
2024-08-30 08:21:02 -07:00 |
|
Yohan Na
|
dc13e99348
|
[MODEL] add Exaone model support (#7819)
|
2024-08-29 23:34:20 -07:00 |
|
Stas Bekman
|
8c56e57def
|
[Doc] fix 404 link (#7966)
|
2024-08-28 13:54:23 -07:00 |
|
Woosuk Kwon
|
eeffde1ac0
|
[TPU] Upgrade PyTorch XLA nightly (#7967)
|
2024-08-28 13:10:21 -07:00 |
|
Stas Bekman
|
98c12cffe5
|
[Doc] fix the autoAWQ example (#7937)
|
2024-08-28 12:12:32 +00:00 |
|
Peter Salas
|
fab5f53e2d
|
[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt (#7902)
|
2024-08-28 01:53:56 +00:00 |
|
Peter Salas
|
57792ed469
|
[Doc] Fix incorrect docs from #7615 (#7788)
|
2024-08-22 10:02:06 -07:00 |
|
zifeitong
|
df1a21131d
|
[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710)
|
2024-08-22 09:36:24 +08:00 |
|
Peter Salas
|
1ca0d4f86b
|
[Model] Add UltravoxModel and UltravoxConfig (#7615)
|
2024-08-21 22:49:39 +00:00 |
|
William Lin
|
dd53c4b023
|
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-21 15:39:26 -07:00 |
|
Roger Wang
|
4506641212
|
[Doc] Section for Multimodal Language Models (#7719)
|
2024-08-20 23:24:01 -07:00 |
|
Ilya Lavrenov
|
398521ad19
|
[OpenVINO] Updated documentation (#7687)
|
2024-08-20 07:33:56 -06:00 |
|
Michael Goin
|
d4f0f17b02
|
[Doc] Update quantization supported hardware table (#7595)
|
2024-08-16 13:59:27 -07:00 |
|
Michael Goin
|
b3f4e17935
|
[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints (#7444)
|
2024-08-16 13:59:16 -07:00 |
|
Kameshwara Pavan Kumar Mantha
|
22b39e11f2
|
llama_index serving integration documentation (#6973)
Co-authored-by: pavanmantha <pavan.mantha@thevaslabs.io>
|
2024-08-14 15:38:37 -07:00 |
|
Cyrus Leung
|
3f674a49b5
|
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126)
|
2024-08-14 17:55:42 +00:00 |
|
youkaichao
|
199adbb7cf
|
[doc] update test script to include cudagraph (#7501)
|
2024-08-13 21:52:58 -07:00 |
|
Cyrus Leung
|
dd164d72f3
|
[Bugfix][Docs] Update list of mock imports (#7493)
|
2024-08-13 20:37:30 -07:00 |
|
Woosuk Kwon
|
a08df8322e
|
[TPU] Support multi-host inference (#7457)
|
2024-08-13 16:31:20 -07:00 |
|
Peter Salas
|
00c3d68e45
|
[Frontend][Core] Add plumbing to support audio language models (#7446)
|
2024-08-13 17:39:33 +00:00 |
|
Woosuk Kwon
|
e20233d361
|
Revert "[Doc] Update supported_hardware.rst (#7276)" (#7467)
|
2024-08-13 01:37:08 -07:00 |
|
jon-chuang
|
a046f86397
|
[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-12 22:47:41 +00:00 |
|
Roger Wang
|
e6e42e4b17
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|
Simon Mo
|
f020a6297e
|
[Docs] Update readme (#7316)
|
2024-08-11 17:13:37 -07:00 |
|
tomeras91
|
02b1988b9f
|
[Doc] building vLLM with VLLM_TARGET_DEVICE=empty (#7403)
|
2024-08-11 14:38:17 -07:00 |
|
Woosuk Kwon
|
90bab18f24
|
[TPU] Use mark_dynamic to reduce compilation time (#7340)
|
2024-08-10 18:12:22 -07:00 |
|
Simon Mo
|
5923532e15
|
Add Skywork AI as Sponsor (#7314)
|
2024-08-08 13:59:57 -07:00 |
|
Jee Jee Li
|
757ac70a64
|
[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 (#7273)
|
2024-08-08 14:02:41 +00:00 |
|
Michael Goin
|
6d94420246
|
[Doc] Update supported_hardware.rst (#7276)
|
2024-08-07 14:21:50 -07:00 |
|
Stas Bekman
|
0e12cd67a8
|
[Doc] add online speculative decoding example (#7243)
|
2024-08-07 09:58:02 -07:00 |
|
Ilya Lavrenov
|
80cbe10c59
|
[OpenVINO] migrate to latest dependencies versions (#7251)
|
2024-08-07 09:49:10 -07:00 |
|
Roger Wang
|
2385c8f374
|
[Doc] Mock new dependencies for documentation (#7245)
|
2024-08-07 06:43:03 +00:00 |
|
Thomas Parnell
|
789937af2e
|
[Doc] [SpecDecode] Update MLPSpeculator documentation (#7100)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-08-05 23:29:43 +00:00 |
|
Simon Mo
|
4db5176d97
|
bump version to v0.5.4 (#7139)
|
2024-08-05 14:39:48 -07:00 |
|
Jee Jee Li
|
179a6a36f2
|
[Model]Refactor MiniCPMV (#7020)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-04 08:12:41 +00:00 |
|
Yihuan Bu
|
654bc5ca49
|
Support for guided decoding for offline LLM (#6878)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-04 03:12:09 +00:00 |
|
Michael Goin
|
b482b9a5b1
|
[CI/Build] Add support for Python 3.12 (#7035)
|
2024-08-02 13:51:22 -07:00 |
|
Murali Andoorveedu
|
fc912e0886
|
[Models] Support Qwen model with PP (#6974)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-08-01 12:40:43 -07:00 |
|
Jee Jee Li
|
7ecee34321
|
[Kernel][RFC] Refactor the punica kernel based on Triton (#5036)
|
2024-07-31 17:12:24 -07:00 |
|
Alphi
|
2f4e108f75
|
[Bugfix] Clean up MiniCPM-V (#6939)
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-07-31 14:39:19 +00:00 |
|
Cyrus Leung
|
f230cc2ca6
|
[Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836)
|
2024-07-31 10:38:45 +08:00 |
|
Ilya Lavrenov
|
5895b24677
|
[OpenVINO] Updated OpenVINO requirements and build docs (#6948)
|
2024-07-30 11:33:01 -07:00 |
|
Isotr0py
|
7cbd9ec7a9
|
[Model] Initialize support for InternVL2 series models (#6514)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-29 10:16:30 +00:00 |
|
Woosuk Kwon
|
fad5576c58
|
[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856)
|
2024-07-27 10:28:33 -07:00 |
|
Chenggang Wu
|
f954d0715c
|
[Docs] Add RunLLM chat widget (#6857)
|
2024-07-27 09:24:46 -07:00 |
|
Cyrus Leung
|
1ad86acf17
|
[Model] Initial support for BLIP-2 (#5920)
Co-authored-by: ywang96 <ywang@roblox.com>
|
2024-07-27 11:53:07 +00:00 |
|
Roger Wang
|
ecb33a28cb
|
[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860)
|
2024-07-27 09:54:14 +00:00 |
|
Harry Mellor
|
c53041ae3b
|
[Doc] Add missing mock import to docs conf.py (#6834)
|
2024-07-27 04:47:33 +00:00 |
|
omrishiv
|
3c3012398e
|
[Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron (#6844)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
|
2024-07-26 20:20:16 -07:00 |
|
Woosuk Kwon
|
ced36cd89b
|
[ROCm] Upgrade PyTorch nightly version (#6845)
|
2024-07-26 20:16:13 -07:00 |
|
Zhanghao Wu
|
150a1ffbfd
|
[Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283)
|
2024-07-26 14:39:10 -07:00 |
|
Michael Goin
|
281977bd6e
|
[Doc] Add Nemotron to supported model docs (#6843)
|
2024-07-26 17:32:44 -04:00 |
|
Li, Jiang
|
3bbb4936dc
|
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125)
|
2024-07-26 13:50:10 -07:00 |
|
youkaichao
|
85ad7e2d01
|
[doc][debugging] add known issues for hangs (#6816)
|
2024-07-25 21:48:05 -07:00 |
|
Woosuk Kwon
|
b7215de2c5
|
[Docs] Publish 5th meetup slides (#6799)
|
2024-07-25 16:47:55 -07:00 |
|
youkaichao
|
f3ff63c3f4
|
[doc][distributed] improve multinode serving doc (#6804)
|
2024-07-25 15:38:32 -07:00 |
|
Kuntai Du
|
6a1e25b151
|
[Doc] Add documentations for nightly benchmarks (#6412)
|
2024-07-25 11:57:16 -07:00 |
|
Alphi
|
9e169a4c61
|
[Model] Adding support for MiniCPM-V (#4087)
|
2024-07-24 20:59:30 -07:00 |
|
Hongxia Yang
|
d88c458f44
|
[Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users (#6754)
|
2024-07-24 14:32:57 -07:00 |
|
Woosuk Kwon
|
ccc4a73257
|
[Docs][ROCm] Detailed instructions to build from source (#6680)
|
2024-07-24 01:07:23 -07:00 |
|
dongmao zhang
|
87525fab92
|
[bitsandbytes]: support read bnb pre-quantized model (#5753)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-07-23 23:45:09 +00:00 |
|
youkaichao
|
71950af726
|
[doc][distributed] fix doc argument order (#6691)
|
2024-07-23 08:55:33 -07:00 |
|
Woosuk Kwon
|
cb1362a889
|
[Docs] Announce llama3.1 support (#6688)
|
2024-07-23 08:18:15 -07:00 |
|
Roger Wang
|
22fa2e35cb
|
[VLM][Model] Support image input for Chameleon (#6633)
|
2024-07-22 23:50:48 -07:00 |
|
youkaichao
|
c051bfe4eb
|
[doc][distributed] doc for setting up multi-node environment (#6529)
[doc][distributed] add more doc for setting up multi-node environment (#6529)
|
2024-07-22 21:22:09 -07:00 |
|
Cyrus Leung
|
739b61a348
|
[Frontend] Refactor prompt processing (#4028)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-22 10:13:53 -07:00 |
|
Matt Wong
|
06d6c5fe9f
|
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543)
|
2024-07-20 09:39:07 -07:00 |
|
Murali Andoorveedu
|
45ceb85a0c
|
[Docs] Update PP docs (#6598)
|
2024-07-19 16:38:21 -07:00 |
|
Simon Mo
|
30efe41532
|
[Docs] Update docs for wheel location (#6580)
|
2024-07-19 12:14:11 -07:00 |
|
milo157
|
a38524f338
|
[DOC] - Add docker image to Cerebrium Integration (#6510)
|
2024-07-17 10:22:53 -07:00 |
|
Cyrus Leung
|
5bf35a91e4
|
[Doc][CI/Build] Update docs and tests to use vllm serve (#6431)
|
2024-07-17 07:43:21 +00:00 |
|
Hongxia Yang
|
10383887e0
|
[ROCm] Cleanup Dockerfile and remove outdated patch (#6482)
|
2024-07-16 22:47:02 -07:00 |
|
Jiaxin Shan
|
94162beb9f
|
[Doc] Fix the lora adapter path in server startup script (#6230)
|
2024-07-16 10:11:04 -07:00 |
|
Woosuk Kwon
|
c467dff24f
|
[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457)
|
2024-07-16 09:56:28 -07:00 |
|
youkaichao
|
9f4ccec761
|
[doc][misc] remind to cancel debugging environment variables (#6481)
[doc][misc] remind users to cancel debugging environment variables after debugging (#6481)
|
2024-07-16 09:45:30 -07:00 |
|
Woosuk Kwon
|
3dee97b05f
|
[Docs] Add Google Cloud to sponsor list (#6450)
|
2024-07-15 11:58:10 -07:00 |
|
youkaichao
|
94b82e8c18
|
[doc][distributed] add suggestion for distributed inference (#6418)
|
2024-07-15 09:45:51 -07:00 |
|
youkaichao
|
22e79ee8f3
|
[doc][misc] doc update (#6439)
|
2024-07-14 23:33:25 -07:00 |
|
Robert Cohn
|
61e85dbad8
|
[Doc] xpu backend requires running setvars.sh (#6393)
|
2024-07-14 17:10:11 -07:00 |
|
Ethan Xu
|
dbfe254eda
|
[Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-07-14 15:36:43 -07:00 |
|
Yuan Tang
|
6ef3bf912c
|
Remove unnecessary trailing period in spec_decode.rst (#6405)
|
2024-07-14 07:58:09 +00:00 |
|
Isotr0py
|
540c0368b1
|
[Model] Initialize Fuyu-8B support (#3924)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-14 05:27:14 +00:00 |
|
Saliya Ekanayake
|
a27f87da34
|
[Doc] Fix Typo in Doc (#6392)
Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>
|
2024-07-13 00:48:23 +00:00 |
|
Simon Mo
|
d719ba24c5
|
Build some nightly wheels by default (#6380)
|
2024-07-12 13:56:59 -07:00 |
|
youkaichao
|
2d23b42d92
|
[doc] update pipeline parallel in readme (#6347)
|
2024-07-11 11:38:40 -07:00 |
|
Jie Fu (傅杰)
|
439c84581a
|
[Doc] Update description of vLLM support for CPUs (#6003)
|
2024-07-10 21:15:29 -07:00 |
|
Cyrus Leung
|
8a924d2248
|
[Doc] Guide for adding multi-modal plugins (#6205)
|
2024-07-10 14:55:34 +08:00 |
|
Murali Andoorveedu
|
673dd4cae9
|
[Docs] Docs update for Pipeline Parallel (#6222)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-07-09 16:24:58 -07:00 |
|
Roger Wang
|
6206dcb29e
|
[Model] Add PaliGemma (#5189)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-07-07 09:25:50 +08:00 |
|
Cyrus Leung
|
9389380015
|
[Doc] Move guide for multimodal model and other improvements (#6168)
|
2024-07-06 17:18:59 +08:00 |
|
Roger Wang
|
175c43eca4
|
[Doc] Reorganize Supported Models by Type (#6167)
|
2024-07-06 05:59:36 +00:00 |
|
Simon Mo
|
79d406e918
|
[Docs] Fix readthedocs for tag build (#6158)
|
2024-07-05 12:44:40 -07:00 |
|
Cyrus Leung
|
ae96ef8fbd
|
[VLM] Calculate maximum number of multi-modal tokens by model (#6121)
|
2024-07-04 16:37:23 -07:00 |
|