Jee Jee Li
179a6a36f2
[Model]Refactor MiniCPMV ( #7020 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 08:12:41 +00:00
Yihuan Bu
654bc5ca49
Support for guided decoding for offline LLM ( #6878 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 03:12:09 +00:00
Michael Goin
b482b9a5b1
[CI/Build] Add support for Python 3.12 ( #7035 )
2024-08-02 13:51:22 -07:00
Murali Andoorveedu
fc912e0886
[Models] Support Qwen model with PP ( #6974 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-08-01 12:40:43 -07:00
Jee Jee Li
7ecee34321
[Kernel][RFC] Refactor the punica kernel based on Triton ( #5036 )
2024-07-31 17:12:24 -07:00
Alphi
2f4e108f75
[Bugfix] Clean up MiniCPM-V ( #6939 )
...
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-31 14:39:19 +00:00
Cyrus Leung
f230cc2ca6
[Bugfix] Fix broadcasting logic for multi_modal_kwargs ( #6836 )
2024-07-31 10:38:45 +08:00
Ilya Lavrenov
5895b24677
[OpenVINO] Updated OpenVINO requirements and build docs ( #6948 )
2024-07-30 11:33:01 -07:00
Isotr0py
7cbd9ec7a9
[Model] Initialize support for InternVL2 series models ( #6514 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-29 10:16:30 +00:00
Woosuk Kwon
fad5576c58
[TPU] Reduce compilation time & Upgrade PyTorch XLA version ( #6856 )
2024-07-27 10:28:33 -07:00
Chenggang Wu
f954d0715c
[Docs] Add RunLLM chat widget ( #6857 )
2024-07-27 09:24:46 -07:00
Cyrus Leung
1ad86acf17
[Model] Initial support for BLIP-2 ( #5920 )
...
Co-authored-by: ywang96 <ywang@roblox.com>
2024-07-27 11:53:07 +00:00
Roger Wang
ecb33a28cb
[CI/Build][Doc] Update CI and Doc for VLM example changes ( #6860 )
2024-07-27 09:54:14 +00:00
Harry Mellor
c53041ae3b
[Doc] Add missing mock import to docs conf.py ( #6834 )
2024-07-27 04:47:33 +00:00
omrishiv
3c3012398e
[Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron ( #6844 )
...
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-07-26 20:20:16 -07:00
Woosuk Kwon
ced36cd89b
[ROCm] Upgrade PyTorch nightly version ( #6845 )
2024-07-26 20:16:13 -07:00
Zhanghao Wu
150a1ffbfd
[Doc] Update SkyPilot doc for wrong indents and instructions for update service ( #4283 )
2024-07-26 14:39:10 -07:00
Michael Goin
281977bd6e
[Doc] Add Nemotron to supported model docs ( #6843 )
2024-07-26 17:32:44 -04:00
Li, Jiang
3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation ( #6125 )
2024-07-26 13:50:10 -07:00
youkaichao
85ad7e2d01
[doc][debugging] add known issues for hangs ( #6816 )
2024-07-25 21:48:05 -07:00
Woosuk Kwon
b7215de2c5
[Docs] Publish 5th meetup slides ( #6799 )
2024-07-25 16:47:55 -07:00
youkaichao
f3ff63c3f4
[doc][distributed] improve multinode serving doc ( #6804 )
2024-07-25 15:38:32 -07:00
Kuntai Du
6a1e25b151
[Doc] Add documentations for nightly benchmarks ( #6412 )
2024-07-25 11:57:16 -07:00
Alphi
9e169a4c61
[Model] Adding support for MiniCPM-V ( #4087 )
2024-07-24 20:59:30 -07:00
Hongxia Yang
d88c458f44
[Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users ( #6754 )
2024-07-24 14:32:57 -07:00
Woosuk Kwon
ccc4a73257
[Docs][ROCm] Detailed instructions to build from source ( #6680 )
2024-07-24 01:07:23 -07:00
dongmao zhang
87525fab92
[bitsandbytes]: support read bnb pre-quantized model ( #5753 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-23 23:45:09 +00:00
youkaichao
71950af726
[doc][distributed] fix doc argument order ( #6691 )
2024-07-23 08:55:33 -07:00
Woosuk Kwon
cb1362a889
[Docs] Announce llama3.1 support ( #6688 )
2024-07-23 08:18:15 -07:00
Roger Wang
22fa2e35cb
[VLM][Model] Support image input for Chameleon ( #6633 )
2024-07-22 23:50:48 -07:00
youkaichao
c051bfe4eb
[doc][distributed] doc for setting up multi-node environment ( #6529 )
...
[doc][distributed] add more doc for setting up multi-node environment (#6529 )
2024-07-22 21:22:09 -07:00
Cyrus Leung
739b61a348
[Frontend] Refactor prompt processing ( #4028 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-22 10:13:53 -07:00
Matt Wong
06d6c5fe9f
[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes ( #6543 )
2024-07-20 09:39:07 -07:00
Murali Andoorveedu
45ceb85a0c
[Docs] Update PP docs ( #6598 )
2024-07-19 16:38:21 -07:00
Simon Mo
30efe41532
[Docs] Update docs for wheel location ( #6580 )
2024-07-19 12:14:11 -07:00
milo157
a38524f338
[DOC] - Add docker image to Cerebrium Integration ( #6510 )
2024-07-17 10:22:53 -07:00
Cyrus Leung
5bf35a91e4
[Doc][CI/Build] Update docs and tests to use vllm serve ( #6431 )
2024-07-17 07:43:21 +00:00
Hongxia Yang
10383887e0
[ROCm] Cleanup Dockerfile and remove outdated patch ( #6482 )
2024-07-16 22:47:02 -07:00
Jiaxin Shan
94162beb9f
[Doc] Fix the lora adapter path in server startup script ( #6230 )
2024-07-16 10:11:04 -07:00
Woosuk Kwon
c467dff24f
[Hardware][TPU] Support MoE with Pallas GMM kernel ( #6457 )
2024-07-16 09:56:28 -07:00
youkaichao
9f4ccec761
[doc][misc] remind to cancel debugging environment variables ( #6481 )
...
[doc][misc] remind users to cancel debugging environment variables after debugging (#6481 )
2024-07-16 09:45:30 -07:00
Woosuk Kwon
3dee97b05f
[Docs] Add Google Cloud to sponsor list ( #6450 )
2024-07-15 11:58:10 -07:00
youkaichao
94b82e8c18
[doc][distributed] add suggestion for distributed inference ( #6418 )
2024-07-15 09:45:51 -07:00
youkaichao
22e79ee8f3
[doc][misc] doc update ( #6439 )
2024-07-14 23:33:25 -07:00
Robert Cohn
61e85dbad8
[Doc] xpu backend requires running setvars.sh ( #6393 )
2024-07-14 17:10:11 -07:00
Ethan Xu
dbfe254eda
[Feature] vLLM CLI ( #5090 )
...
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-07-14 15:36:43 -07:00
Yuan Tang
6ef3bf912c
Remove unnecessary trailing period in spec_decode.rst ( #6405 )
2024-07-14 07:58:09 +00:00
Isotr0py
540c0368b1
[Model] Initialize Fuyu-8B support ( #3924 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-14 05:27:14 +00:00
Saliya Ekanayake
a27f87da34
[Doc] Fix Typo in Doc ( #6392 )
...
Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>
2024-07-13 00:48:23 +00:00
Simon Mo
d719ba24c5
Build some nightly wheels by default ( #6380 )
2024-07-12 13:56:59 -07:00
youkaichao
2d23b42d92
[doc] update pipeline parallel in readme ( #6347 )
2024-07-11 11:38:40 -07:00
Jie Fu (傅杰)
439c84581a
[Doc] Update description of vLLM support for CPUs ( #6003 )
2024-07-10 21:15:29 -07:00
Cyrus Leung
8a924d2248
[Doc] Guide for adding multi-modal plugins ( #6205 )
2024-07-10 14:55:34 +08:00
Murali Andoorveedu
673dd4cae9
[Docs] Docs update for Pipeline Parallel ( #6222 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-07-09 16:24:58 -07:00
Roger Wang
6206dcb29e
[Model] Add PaliGemma ( #5189 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-07-07 09:25:50 +08:00
Cyrus Leung
9389380015
[Doc] Move guide for multimodal model and other improvements ( #6168 )
2024-07-06 17:18:59 +08:00
Roger Wang
175c43eca4
[Doc] Reorganize Supported Models by Type ( #6167 )
2024-07-06 05:59:36 +00:00
Simon Mo
79d406e918
[Docs] Fix readthedocs for tag build ( #6158 )
2024-07-05 12:44:40 -07:00
Cyrus Leung
ae96ef8fbd
[VLM] Calculate maximum number of multi-modal tokens by model ( #6121 )
2024-07-04 16:37:23 -07:00
youkaichao
27902d42be
[misc][doc] try to add warning for latest html ( #5979 )
2024-07-04 09:57:09 -07:00
youkaichao
966fe72141
[doc][misc] bump up py version in installation doc ( #6119 )
2024-07-03 15:52:04 -07:00
xwjiang2010
d9e98f42e4
[vlm] Remove vision language config. ( #6089 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-03 22:14:16 +00:00
Michael Goin
47f0954af0
[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin ( #5975 )
2024-07-03 17:38:00 +00:00
Roger Wang
f1c78138aa
[Doc] Fix Mock Import ( #6094 )
2024-07-03 00:13:56 -07:00
Cyrus Leung
9831aec49f
[Core] Dynamic image size support for VLMs ( #5276 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-07-02 20:34:00 -07:00
Mor Zusman
9d6a8daa87
[Model] Jamba support ( #4115 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 23:11:29 +00:00
xwjiang2010
98d6682cd1
[VLM] Remove image_input_type from VLM config ( #5852 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-02 07:57:09 +00:00
Roger Wang
8e0817c262
[Bugfix][Doc] Fix Doc Formatting ( #6048 )
2024-07-01 15:09:11 -07:00
ning.zhang
83bdcb6ac3
add FAQ doc under 'serving' ( #5946 )
2024-07-01 14:11:36 -07:00
youkaichao
4050d646e5
[doc][misc] remove deprecated api server in doc ( #6037 )
2024-07-01 12:52:43 -04:00
Ilya Lavrenov
57f09a419c
[Hardware][Intel] OpenVINO vLLM backend ( #5379 )
2024-06-28 13:50:16 +00:00
Cyrus Leung
5cbe8d155c
[Core] Registry for processing model inputs ( #5214 )
...
Co-authored-by: ywang96 <ywang@roblox.com>
2024-06-28 12:09:56 +00:00
Woosuk Kwon
79c92c7c8a
[Model] Add Gemma 2 ( #5908 )
2024-06-27 13:33:56 -07:00
youkaichao
3fd02bda51
[doc][misc] add note for Kubernetes users ( #5916 )
2024-06-27 10:07:07 -07:00
Cyrus Leung
96354d6a29
[Model] Add base class for LoRA-supported models ( #5018 )
2024-06-27 16:03:04 +08:00
youkaichao
294104c3f9
[doc] update usage of env var to avoid conflict ( #5873 )
2024-06-26 17:57:12 -04:00
Roger Wang
3aa7b6cf66
[Misc][Doc] Add Example of using OpenAI Server with VLM ( #5832 )
2024-06-25 20:34:25 -07:00
Matt Wong
dd793d1de5
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes ( #5422 )
2024-06-25 15:56:15 -07:00
youkaichao
c18ebfdd71
[doc][distributed] add both gloo and nccl tests ( #5834 )
2024-06-25 15:10:28 -04:00
Cyrus Leung
f23871e9ee
[Doc] Add notice about breaking changes to VLMs ( #5818 )
2024-06-25 01:25:03 -07:00
Michael Goin
1744cc99ba
[Doc] Add Phi-3-medium to list of supported models ( #5788 )
2024-06-24 10:48:55 -07:00
Michael Goin
e72dc6cb35
[Doc] Add "Suggest edit" button to doc pages ( #5789 )
2024-06-24 10:26:17 -07:00
youkaichao
c246212952
[doc][faq] add warning to download models for every nodes ( #5783 )
2024-06-24 15:37:42 +08:00
Woosuk Kwon
8c00f9c15d
[Docs][TPU] Add installation tip for TPU ( #5761 )
2024-06-21 23:09:40 -07:00
Michael Goin
5b15bde539
[Doc] Documentation on supported hardware for quantization methods ( #5745 )
2024-06-21 12:44:29 -04:00
Roger Wang
1b2eaac316
[Bugfix][Doc] FIx Duplicate Explicit Target Name Errors ( #5703 )
2024-06-19 23:10:47 -07:00
Rafael Vasquez
e83db9e7e3
[Doc] Update docker references ( #5614 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-06-19 15:01:45 -07:00
milo157
2bd231a7b7
[Doc] Added cerebrium as Integration option ( #5553 )
2024-06-18 15:56:59 -07:00
Isotr0py
daef218b55
[Model] Initialize Phi-3-vision support ( #4986 )
2024-06-17 19:34:33 -07:00
Kunshang Ji
728c4c8a06
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend ( #3814 )
...
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
2024-06-17 11:01:25 -07:00
youkaichao
845a3f26f9
[Doc] add debugging tips for crash and multi-node debugging ( #5581 )
2024-06-17 10:08:01 +08:00
Sanger Steel
6e2527a7cb
[Doc] Update documentation on Tensorizer ( #5471 )
2024-06-14 11:27:57 -07:00
Simon Mo
cdab68dcdb
[Docs] Add ZhenFund as a Sponsor ( #5548 )
2024-06-14 11:17:21 -07:00
Cyrus Leung
0ce7b952f8
[Doc] Update LLaVA docs ( #5437 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-13 11:22:07 -07:00
Woosuk Kwon
a65634d3ae
[Docs] Add 4th meetup slides ( #5509 )
2024-06-13 10:18:26 -07:00
Li, Jiang
80aa7e91fc
[Hardware][Intel] Optimize CPU backend and add more performance tips ( #4971 )
...
Co-authored-by: Jianan Gu <jianan.gu@intel.com>
2024-06-13 09:33:14 -07:00
Cyrus Leung
b8d4dfff9c
[Doc] Update debug docs ( #5438 )
2024-06-12 14:49:31 -07:00
Woosuk Kwon
1a8bfd92d5
[Hardware] Initial TPU integration ( #5292 )
2024-06-12 11:53:03 -07:00
youkaichao
8f89d72090
[Doc] add common case for long waiting time ( #5430 )
2024-06-11 11:12:13 -07:00
Nick Hill
99dac099ab
[Core][Doc] Default to multiprocessing for single-node distributed case ( #5230 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2024-06-11 11:10:41 -07:00