squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jee Jee Li	179a6a36f2	[Model]Refactor MiniCPMV (#7020 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-04 08:12:41 +00:00
Yihuan Bu	654bc5ca49	Support for guided decoding for offline LLM (#6878 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-04 03:12:09 +00:00
Michael Goin	b482b9a5b1	[CI/Build] Add support for Python 3.12 (#7035 )	2024-08-02 13:51:22 -07:00
Murali Andoorveedu	fc912e0886	[Models] Support Qwen model with PP (#6974 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-08-01 12:40:43 -07:00
Jee Jee Li	7ecee34321	[Kernel][RFC] Refactor the punica kernel based on Triton (#5036 )	2024-07-31 17:12:24 -07:00
Alphi	2f4e108f75	[Bugfix] Clean up MiniCPM-V (#6939 ) Co-authored-by: hezhihui <hzh7269@modelbest.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-07-31 14:39:19 +00:00
Cyrus Leung	f230cc2ca6	[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836 )	2024-07-31 10:38:45 +08:00
Ilya Lavrenov	5895b24677	[OpenVINO] Updated OpenVINO requirements and build docs (#6948 )	2024-07-30 11:33:01 -07:00
Isotr0py	7cbd9ec7a9	[Model] Initialize support for InternVL2 series models (#6514 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-29 10:16:30 +00:00
Woosuk Kwon	fad5576c58	[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856 )	2024-07-27 10:28:33 -07:00
Chenggang Wu	f954d0715c	[Docs] Add RunLLM chat widget (#6857 )	2024-07-27 09:24:46 -07:00
Cyrus Leung	1ad86acf17	[Model] Initial support for BLIP-2 (#5920 ) Co-authored-by: ywang96 <ywang@roblox.com>	2024-07-27 11:53:07 +00:00
Roger Wang	ecb33a28cb	[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860 )	2024-07-27 09:54:14 +00:00
Harry Mellor	c53041ae3b	[Doc] Add missing mock import to docs `conf.py` (#6834 )	2024-07-27 04:47:33 +00:00
omrishiv	3c3012398e	[Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron (#6844 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-07-26 20:20:16 -07:00
Woosuk Kwon	ced36cd89b	[ROCm] Upgrade PyTorch nightly version (#6845 )	2024-07-26 20:16:13 -07:00
Zhanghao Wu	150a1ffbfd	[Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283 )	2024-07-26 14:39:10 -07:00
Michael Goin	281977bd6e	[Doc] Add Nemotron to supported model docs (#6843 )	2024-07-26 17:32:44 -04:00
Li, Jiang	3bbb4936dc	[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125 )	2024-07-26 13:50:10 -07:00
youkaichao	85ad7e2d01	[doc][debugging] add known issues for hangs (#6816 )	2024-07-25 21:48:05 -07:00
Woosuk Kwon	b7215de2c5	[Docs] Publish 5th meetup slides (#6799 )	2024-07-25 16:47:55 -07:00
youkaichao	f3ff63c3f4	[doc][distributed] improve multinode serving doc (#6804 )	2024-07-25 15:38:32 -07:00
Kuntai Du	6a1e25b151	[Doc] Add documentations for nightly benchmarks (#6412 )	2024-07-25 11:57:16 -07:00
Alphi	9e169a4c61	[Model] Adding support for MiniCPM-V (#4087 )	2024-07-24 20:59:30 -07:00
Hongxia Yang	d88c458f44	[Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users (#6754 )	2024-07-24 14:32:57 -07:00
Woosuk Kwon	ccc4a73257	[Docs][ROCm] Detailed instructions to build from source (#6680 )	2024-07-24 01:07:23 -07:00
dongmao zhang	87525fab92	[bitsandbytes]: support read bnb pre-quantized model (#5753 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-07-23 23:45:09 +00:00
youkaichao	71950af726	[doc][distributed] fix doc argument order (#6691 )	2024-07-23 08:55:33 -07:00
Woosuk Kwon	cb1362a889	[Docs] Announce llama3.1 support (#6688 )	2024-07-23 08:18:15 -07:00
Roger Wang	22fa2e35cb	[VLM][Model] Support image input for Chameleon (#6633 )	2024-07-22 23:50:48 -07:00
youkaichao	c051bfe4eb	[doc][distributed] doc for setting up multi-node environment (#6529 ) [doc][distributed] add more doc for setting up multi-node environment (#6529)	2024-07-22 21:22:09 -07:00
Cyrus Leung	739b61a348	[Frontend] Refactor prompt processing (#4028 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-22 10:13:53 -07:00
Matt Wong	06d6c5fe9f	[Bugfix][CI/Build][Hardware][AMD] Fix AMD tests, add HF cache, update CK FA, add partially supported model notes (#6543 )	2024-07-20 09:39:07 -07:00
Murali Andoorveedu	45ceb85a0c	[Docs] Update PP docs (#6598 )	2024-07-19 16:38:21 -07:00
Simon Mo	30efe41532	[Docs] Update docs for wheel location (#6580 )	2024-07-19 12:14:11 -07:00
milo157	a38524f338	[DOC] - Add docker image to Cerebrium Integration (#6510 )	2024-07-17 10:22:53 -07:00
Cyrus Leung	5bf35a91e4	[Doc][CI/Build] Update docs and tests to use `vllm serve` (#6431 )	2024-07-17 07:43:21 +00:00
Hongxia Yang	10383887e0	[ROCm] Cleanup Dockerfile and remove outdated patch (#6482 )	2024-07-16 22:47:02 -07:00
Jiaxin Shan	94162beb9f	[Doc] Fix the lora adapter path in server startup script (#6230 )	2024-07-16 10:11:04 -07:00
Woosuk Kwon	c467dff24f	[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457 )	2024-07-16 09:56:28 -07:00
youkaichao	9f4ccec761	[doc][misc] remind to cancel debugging environment variables (#6481 ) [doc][misc] remind users to cancel debugging environment variables after debugging (#6481)	2024-07-16 09:45:30 -07:00
Woosuk Kwon	3dee97b05f	[Docs] Add Google Cloud to sponsor list (#6450 )	2024-07-15 11:58:10 -07:00
youkaichao	94b82e8c18	[doc][distributed] add suggestion for distributed inference (#6418 )	2024-07-15 09:45:51 -07:00
youkaichao	22e79ee8f3	[doc][misc] doc update (#6439 )	2024-07-14 23:33:25 -07:00
Robert Cohn	61e85dbad8	[Doc] xpu backend requires running setvars.sh (#6393 )	2024-07-14 17:10:11 -07:00
Ethan Xu	dbfe254eda	[Feature] vLLM CLI (#5090 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-07-14 15:36:43 -07:00
Yuan Tang	6ef3bf912c	Remove unnecessary trailing period in spec_decode.rst (#6405 )	2024-07-14 07:58:09 +00:00
Isotr0py	540c0368b1	[Model] Initialize Fuyu-8B support (#3924 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-14 05:27:14 +00:00
Saliya Ekanayake	a27f87da34	[Doc] Fix Typo in Doc (#6392 ) Co-authored-by: Saliya Ekanayake <esaliya@d-matrix.ai>	2024-07-13 00:48:23 +00:00
Simon Mo	d719ba24c5	Build some nightly wheels by default (#6380 )	2024-07-12 13:56:59 -07:00
youkaichao	2d23b42d92	[doc] update pipeline parallel in readme (#6347 )	2024-07-11 11:38:40 -07:00
Jie Fu (傅杰)	439c84581a	[Doc] Update description of vLLM support for CPUs (#6003 )	2024-07-10 21:15:29 -07:00
Cyrus Leung	8a924d2248	[Doc] Guide for adding multi-modal plugins (#6205 )	2024-07-10 14:55:34 +08:00
Murali Andoorveedu	673dd4cae9	[Docs] Docs update for Pipeline Parallel (#6222 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-07-09 16:24:58 -07:00
Roger Wang	6206dcb29e	[Model] Add PaliGemma (#5189 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-07-07 09:25:50 +08:00
Cyrus Leung	9389380015	[Doc] Move guide for multimodal model and other improvements (#6168 )	2024-07-06 17:18:59 +08:00
Roger Wang	175c43eca4	[Doc] Reorganize Supported Models by Type (#6167 )	2024-07-06 05:59:36 +00:00
Simon Mo	79d406e918	[Docs] Fix readthedocs for tag build (#6158 )	2024-07-05 12:44:40 -07:00
Cyrus Leung	ae96ef8fbd	[VLM] Calculate maximum number of multi-modal tokens by model (#6121 )	2024-07-04 16:37:23 -07:00
youkaichao	27902d42be	[misc][doc] try to add warning for latest html (#5979 )	2024-07-04 09:57:09 -07:00
youkaichao	966fe72141	[doc][misc] bump up py version in installation doc (#6119 )	2024-07-03 15:52:04 -07:00
xwjiang2010	d9e98f42e4	[vlm] Remove vision language config. (#6089 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-03 22:14:16 +00:00
Michael Goin	47f0954af0	[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975 )	2024-07-03 17:38:00 +00:00
Roger Wang	f1c78138aa	[Doc] Fix Mock Import (#6094 )	2024-07-03 00:13:56 -07:00
Cyrus Leung	9831aec49f	[Core] Dynamic image size support for VLMs (#5276 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: ywang96 <ywang@roblox.com> Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-07-02 20:34:00 -07:00
Mor Zusman	9d6a8daa87	[Model] Jamba support (#4115 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: Erez Schwartz <erezs@ai21.com> Co-authored-by: Mor Zusman <morz@ai21.com> Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Co-authored-by: Tomer Asida <tomera@ai21.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-02 23:11:29 +00:00
xwjiang2010	98d6682cd1	[VLM] Remove `image_input_type` from VLM config (#5852 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-02 07:57:09 +00:00
Roger Wang	8e0817c262	[Bugfix][Doc] Fix Doc Formatting (#6048 )	2024-07-01 15:09:11 -07:00
ning.zhang	83bdcb6ac3	add FAQ doc under 'serving' (#5946 )	2024-07-01 14:11:36 -07:00
youkaichao	4050d646e5	[doc][misc] remove deprecated api server in doc (#6037 )	2024-07-01 12:52:43 -04:00
Ilya Lavrenov	57f09a419c	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
Cyrus Leung	5cbe8d155c	[Core] Registry for processing model inputs (#5214 ) Co-authored-by: ywang96 <ywang@roblox.com>	2024-06-28 12:09:56 +00:00
Woosuk Kwon	79c92c7c8a	[Model] Add Gemma 2 (#5908 )	2024-06-27 13:33:56 -07:00
youkaichao	3fd02bda51	[doc][misc] add note for Kubernetes users (#5916 )	2024-06-27 10:07:07 -07:00
Cyrus Leung	96354d6a29	[Model] Add base class for LoRA-supported models (#5018 )	2024-06-27 16:03:04 +08:00
youkaichao	294104c3f9	[doc] update usage of env var to avoid conflict (#5873 )	2024-06-26 17:57:12 -04:00
Roger Wang	3aa7b6cf66	[Misc][Doc] Add Example of using OpenAI Server with VLM (#5832 )	2024-06-25 20:34:25 -07:00
Matt Wong	dd793d1de5	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422 )	2024-06-25 15:56:15 -07:00
youkaichao	c18ebfdd71	[doc][distributed] add both gloo and nccl tests (#5834 )	2024-06-25 15:10:28 -04:00
Cyrus Leung	f23871e9ee	[Doc] Add notice about breaking changes to VLMs (#5818 )	2024-06-25 01:25:03 -07:00
Michael Goin	1744cc99ba	[Doc] Add Phi-3-medium to list of supported models (#5788 )	2024-06-24 10:48:55 -07:00
Michael Goin	e72dc6cb35	[Doc] Add "Suggest edit" button to doc pages (#5789 )	2024-06-24 10:26:17 -07:00
youkaichao	c246212952	[doc][faq] add warning to download models for every nodes (#5783 )	2024-06-24 15:37:42 +08:00
Woosuk Kwon	8c00f9c15d	[Docs][TPU] Add installation tip for TPU (#5761 )	2024-06-21 23:09:40 -07:00
Michael Goin	5b15bde539	[Doc] Documentation on supported hardware for quantization methods (#5745 )	2024-06-21 12:44:29 -04:00
Roger Wang	1b2eaac316	[Bugfix][Doc] FIx Duplicate Explicit Target Name Errors (#5703 )	2024-06-19 23:10:47 -07:00
Rafael Vasquez	e83db9e7e3	[Doc] Update docker references (#5614 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-06-19 15:01:45 -07:00
milo157	2bd231a7b7	[Doc] Added cerebrium as Integration option (#5553 )	2024-06-18 15:56:59 -07:00
Isotr0py	daef218b55	[Model] Initialize Phi-3-vision support (#4986 )	2024-06-17 19:34:33 -07:00
Kunshang Ji	728c4c8a06	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 ) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-06-17 11:01:25 -07:00
youkaichao	845a3f26f9	[Doc] add debugging tips for crash and multi-node debugging (#5581 )	2024-06-17 10:08:01 +08:00
Sanger Steel	6e2527a7cb	[Doc] Update documentation on Tensorizer (#5471 )	2024-06-14 11:27:57 -07:00
Simon Mo	cdab68dcdb	[Docs] Add ZhenFund as a Sponsor (#5548 )	2024-06-14 11:17:21 -07:00
Cyrus Leung	0ce7b952f8	[Doc] Update LLaVA docs (#5437 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-06-13 11:22:07 -07:00
Woosuk Kwon	a65634d3ae	[Docs] Add 4th meetup slides (#5509 )	2024-06-13 10:18:26 -07:00
Li, Jiang	80aa7e91fc	[Hardware][Intel] Optimize CPU backend and add more performance tips (#4971 ) Co-authored-by: Jianan Gu <jianan.gu@intel.com>	2024-06-13 09:33:14 -07:00
Cyrus Leung	b8d4dfff9c	[Doc] Update debug docs (#5438 )	2024-06-12 14:49:31 -07:00
Woosuk Kwon	1a8bfd92d5	[Hardware] Initial TPU integration (#5292 )	2024-06-12 11:53:03 -07:00
youkaichao	8f89d72090	[Doc] add common case for long waiting time (#5430 )	2024-06-11 11:12:13 -07:00
Nick Hill	99dac099ab	[Core][Doc] Default to multiprocessing for single-node distributed case (#5230 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-06-11 11:10:41 -07:00

1 2 3 4 5 ...

335 Commits