squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
youkaichao	845a3f26f9	[Doc] add debugging tips for crash and multi-node debugging (#5581 )	2024-06-17 10:08:01 +08:00
Sanger Steel	6e2527a7cb	[Doc] Update documentation on Tensorizer (#5471 )	2024-06-14 11:27:57 -07:00
Simon Mo	cdab68dcdb	[Docs] Add ZhenFund as a Sponsor (#5548 )	2024-06-14 11:17:21 -07:00
Cyrus Leung	0ce7b952f8	[Doc] Update LLaVA docs (#5437 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-06-13 11:22:07 -07:00
Woosuk Kwon	a65634d3ae	[Docs] Add 4th meetup slides (#5509 )	2024-06-13 10:18:26 -07:00
Li, Jiang	80aa7e91fc	[Hardware][Intel] Optimize CPU backend and add more performance tips (#4971 ) Co-authored-by: Jianan Gu <jianan.gu@intel.com>	2024-06-13 09:33:14 -07:00
Cyrus Leung	b8d4dfff9c	[Doc] Update debug docs (#5438 )	2024-06-12 14:49:31 -07:00
Woosuk Kwon	1a8bfd92d5	[Hardware] Initial TPU integration (#5292 )	2024-06-12 11:53:03 -07:00
youkaichao	8f89d72090	[Doc] add common case for long waiting time (#5430 )	2024-06-11 11:12:13 -07:00
Nick Hill	99dac099ab	[Core][Doc] Default to multiprocessing for single-node distributed case (#5230 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-06-11 11:10:41 -07:00
Cade Daniel	89ec06c33b	[Docs] [Spec decode] Fix docs error in code example (#5427 )	2024-06-11 10:31:56 -07:00
Kuntai Du	9fde251bf0	[Doc] Add an automatic prefix caching section in vllm documentation (#5324 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-06-11 10:24:59 -07:00
Cade Daniel	4c2ffb28ff	[Speculative decoding] Initial spec decode docs (#5400 )	2024-06-11 10:15:40 -07:00
SangBin Cho	246598a6b1	[CI] docfix (#5410 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: ywang96 <ywang@roblox.com>	2024-06-11 01:28:50 -07:00
Roger Wang	3c4cebf751	[Doc][Typo] Fixing Missing Comma (#5403 )	2024-06-11 00:20:28 -07:00
youkaichao	d8f31f2f8b	[Doc] add debugging tips (#5409 )	2024-06-10 23:21:43 -07:00
Michael Goin	77c87beb06	[Doc] Add documentation for FP8 W8A8 (#5388 )	2024-06-10 18:55:12 -06:00
Woosuk Kwon	cb77ad836f	[Docs] Alphabetically sort sponsors (#5386 )	2024-06-10 15:17:19 -05:00
Roger Wang	856c990041	[Docs] Add Docs on Limitations of VLM Support (#5383 )	2024-06-10 09:53:50 -07:00
Cyrus Leung	6b29d6fe70	[Model] Initial support for LLaVA-NeXT (#4199 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-06-10 12:47:15 +00:00
Roger Wang	7a9cb294ae	[Frontend] Add OpenAI Vision API Support (#5237 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-06-07 11:23:32 -07:00
Simon Mo	f270a39537	[Docs] Add Sequoia as sponsors (#5287 )	2024-06-05 18:02:56 +00:00
Jie Fu (傅杰)	87d5abef75	[Bugfix] Fix a bug caused by pip install setuptools>=49.4.0 for CPU backend (#5249 )	2024-06-04 09:57:51 -07:00
Breno Faria	f775a07e30	[FRONTEND] OpenAI `tools` support named functions (#5032 )	2024-06-03 18:25:29 -05:00
Cyrus Leung	7a64d24aad	[Core] Support image processor (#4197 )	2024-06-02 22:56:41 -07:00
Nick Hill	657579113f	[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171 )	2024-05-31 17:20:19 -07:00
Chansung Park	429d89720e	add doc about serving option on dstack (#3074 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-05-30 10:11:07 -07:00
Cyrus Leung	a9bcc7afb2	[Doc] Use intersphinx and update entrypoints docs (#5125 )	2024-05-30 09:59:23 -07:00
youkaichao	4fbcb0f27e	[Doc][Build] update after removing vllm-nccl (#5103 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-05-29 23:51:18 +00:00
Cyrus Leung	5ae5ed1e60	[Core] Consolidate prompt arguments to LLM engines (#4328 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-05-28 13:29:31 -07:00
Simon Mo	290f4ada2b	[Docs] Add Dropbox as sponsors (#5089 )	2024-05-28 10:29:09 -07:00
Eric Xihui Lin	8e192ff967	[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 ) Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-05-24 22:00:52 -07:00
youkaichao	6a50f4cafa	[Doc] add ccache guide in doc (#5012 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-05-23 23:21:54 +00:00
Simon Mo	e941f88584	[Docs] Add acknowledgment for sponsors (#4925 )	2024-05-21 00:17:25 -07:00
Isotr0py	f12c3b5b3d	[Model] Add Phi-2 LoRA support (#4886 )	2024-05-21 14:24:17 +09:00
Kante Yin	8e7fb5d43a	Support to serve vLLM on Kubernetes with LWS (#4829 ) Signed-off-by: kerthcet <kerthcet@gmail.com>	2024-05-16 16:37:29 -07:00
Cyrus Leung	dc72402b57	[Bugfix][Doc] Fix CI failure in docs (#4804 ) This PR fixes the CI failure introduced by #4798. The failure originates from having duplicate target names in reST, and is fixed by changing the ref targets to anonymous ones. For more information, see this discussion. I have also changed the format of the links to be more distinct from each other.	2024-05-15 01:57:08 +09:00
Zhuohan Li	c579b750a0	[Doc] Add meetups to the doc (#4798 )	2024-05-13 18:48:00 -07:00
Cyrus Leung	4bfa7e7f75	[Doc] Add API reference for offline inference (#4710 )	2024-05-13 17:47:42 -07:00
Zhuohan Li	ac1fbf7fd2	[Doc] Shorten README by removing supported model list (#4796 )	2024-05-13 16:23:54 -07:00
SangBin Cho	e7c46b9527	[Scheduler] Warning upon preemption and Swapping (#4647 ) Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-05-13 23:50:44 +09:00
Allen.Dou	706588a77d	[Bugfix] Fix CLI arguments in OpenAI server docs (#4729 )	2024-05-11 00:00:56 +09:00
Simon Mo	51d4094fda	chunked-prefill-doc-syntax (#4603 ) Fix the docs: https://docs.vllm.ai/en/latest/models/performance.html Co-authored-by: sang <rkooo567@gmail.com>	2024-05-10 14:13:23 +09:00
Cyrus Leung	a3c124570a	[Bugfix] Fix CLI arguments in OpenAI server docs (#4709 )	2024-05-09 09:53:14 -07:00
SangBin Cho	36fb68f947	[Doc] Chunked Prefill Documentation (#4580 )	2024-05-04 00:18:00 -07:00
youkaichao	2d7bce9cd5	[Doc] add env vars to the doc (#4572 )	2024-05-03 05:13:49 +00:00
Frαnçois	e491c7e053	[Doc] update(example model): for OpenAI compatible serving (#4503 )	2024-05-01 10:14:16 -07:00
fuchen.ljl	ee37328da0	Unable to find Punica extension issue during source code installation (#4494 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-05-01 00:42:09 +00:00
Prashant Gupta	b31a1fb63c	[Doc] add visualization for multi-stage dockerfile (#4456 ) Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-04-30 17:41:59 +00:00
SangBin Cho	a88081bf76	[CI] Disable non-lazy string operation on logging (#4326 ) Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>	2024-04-26 00:16:58 -07:00
Hongxia Yang	cf29b7eda4	[ROCm][Hardware][AMD][Doc] Documentation update for ROCm (#4376 ) Co-authored-by: WoosukKwon <woosuk.kwon@berkeley.edu>	2024-04-25 18:12:25 -07:00
Isotr0py	fbf152d976	[Bugfix][Model] Refactor OLMo model to support new HF format in transformers 4.40.0 (#4324 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-25 09:35:56 -07:00
Caio Mendes	96e90fdeb3	[Model] Adds Phi-3 support (#4298 )	2024-04-25 03:06:57 +00:00
youkaichao	2768884ac4	[Doc] Add note for docker user (#4340 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-24 21:09:44 +00:00
Harry Mellor	34128a697e	Fix `autodoc` directives (#4272 ) Co-authored-by: Harry Mellor <hmellor@oxts.com>	2024-04-23 01:53:01 +00:00
Zhanghao Wu	ceaf4ed003	[Doc] Update the SkyPilot doc with serving and Llama-3 (#4276 )	2024-04-22 15:34:31 -07:00
Harry Mellor	3d925165f2	Add example scripts to documentation (#4225 ) Co-authored-by: Harry Mellor <hmellor@oxts.com>	2024-04-22 16:36:54 +00:00
xiaoji	7f2593b164	[Doc]: Update the doc of adding new models (#4236 )	2024-04-21 09:57:08 -07:00
Harry Mellor	fe7d648fe5	Don't show default value for flags in `EngineArgs` (#4223 ) Co-authored-by: Harry Mellor <hmellor@oxts.com>	2024-04-21 09:15:28 -07:00
Harry Mellor	682789d402	Fix missing docs and out of sync `EngineArgs` (#4219 ) Co-authored-by: Harry Mellor <hmellor@oxts.com>	2024-04-19 20:51:33 -07:00
Simon Mo	705578ae14	[Docs] document that Meta Llama 3 is supported (#4175 )	2024-04-18 10:55:48 -07:00
Sanger Steel	d619ae2d19	[Doc] Add better clarity for tensorizer usage (#4090 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-15 13:28:25 -07:00
Simon Mo	aceb17cf2d	[Docs] document that mixtral 8x22b is supported (#4073 )	2024-04-14 14:35:55 -07:00
Sanger Steel	711a000255	[Frontend] [Core] feat: Add model loading using `tensorizer` (#3476 )	2024-04-13 17:13:01 -07:00
Michael Feil	c2b4a1bce9	[Doc] Add typing hints / mypy types cleanup (#3816 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-11 17:17:21 -07:00
youkaichao	f3d0bf7589	[Doc][Installation] delete python setup.py develop (#3989 )	2024-04-11 03:33:02 +00:00
Frαnçois	92cd2e2f21	[Doc] Fix getting stared to use publicly available model (#3963 )	2024-04-10 18:05:52 +00:00
youkaichao	e35397468f	[Doc] Add doc to state our model support policy (#3948 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-10 17:03:02 +00:00
ywfang	b4543c8f6b	[Model] add minicpm (#3893 )	2024-04-08 18:28:36 +08:00
youkaichao	95baec828f	[Core] enable out-of-tree model register (#3871 )	2024-04-06 17:11:41 -07:00
youkaichao	d03d64fd2e	[CI/Build] refactor dockerfile & fix pip cache [CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (#3859)	2024-04-04 21:53:16 -07:00
Sean Gallen	78107fa091	[Doc]Add asynchronous engine arguments to documentation. (#3810 ) Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-04 21:52:01 -07:00
Adrian Abeyta	2ff767b513	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 ) Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com> Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by: guofangze <guofangze@kuaishou.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-03 14:15:55 -07:00
Roger Wang	3bec41f41a	[Doc] Fix vLLMEngine Doc Page (#3791 )	2024-04-02 09:49:37 -07:00
bigPYJ1151	0e3f06fe9c	[Hardware][Intel] Add CPU inference backend (#3634 ) Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>	2024-04-01 22:07:30 -07:00
youkaichao	9c82a1bec3	[Doc] Update installation doc (#3746 ) [Doc] Update installation doc for build from source and explain the dependency on torch/cuda version (#3746) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-03-30 16:34:38 -07:00
yhu422	d8658c8cc1	Usage Stats Collection (#2852 )	2024-03-28 22:16:12 -07:00
wenyujin333	d6ea427f04	[Model] Add support for Qwen2MoeModel (#3346 )	2024-03-28 15:19:59 +00:00
Woosuk Kwon	6d9aa00fc4	[Docs] Add Command-R to supported models (#3669 )	2024-03-27 15:20:00 -07:00
Megha Agarwal	e24336b5a7	[Model] Add support for DBRX (#3660 )	2024-03-27 13:01:46 -07:00
Woosuk Kwon	e66b629c04	[Misc] Minor fix in KVCache type (#3652 )	2024-03-26 23:14:06 -07:00
Jee Li	76879342a3	[Doc]add lora support (#3649 )	2024-03-27 02:06:46 +00:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
youkaichao	42bc386129	[CI/Build] respect the common environment variable MAX_JOBS (#3600 )	2024-03-24 17:04:00 -07:00
Lalit Pradhan	4c07dd28c0	[🚀 Ready to be merged] Added support for Jais models (#3183 )	2024-03-21 09:45:24 +00:00
Jim Burtoft	63e8b28a99	[Doc] minor fix of spelling in amd-installation.rst (#3506 )	2024-03-19 20:32:30 +00:00
Jim Burtoft	2a60c9bd17	[Doc] minor fix to neuron-installation.rst (#3505 )	2024-03-19 13:21:35 -07:00
Simon Mo	ef65dcfa6f	[Doc] Add docs about OpenAI compatible server (#3288 )	2024-03-18 22:05:34 -07:00
laneeee	8fa7357f2d	fix document error for value and v_vec illustration (#3421 )	2024-03-15 16:06:09 -07:00
Sherlock Xu	b0925b3878	docs: Add BentoML deployment doc (#3336 ) Signed-off-by: Sherlock113 <sherlockxu07@gmail.com>	2024-03-12 10:34:30 -07:00
Zhuohan Li	4c922709b6	Add distributed model executor abstraction (#3191 )	2024-03-11 11:03:45 -07:00
Philipp Moritz	657061fdce	[docs] Add LoRA support information for models (#3299 )	2024-03-11 00:54:51 -07:00
Roger Wang	99c3cfb83c	[Docs] Fix Unmocked Imports (#3275 )	2024-03-08 09:58:01 -08:00
Jialun Lyu	27a7b070db	Add document for vllm paged attention kernel. (#2978 )	2024-03-04 09:23:34 -08:00
Liangfu Chen	d0fae88114	[DOC] add setup document to support neuron backend (#2777 )	2024-03-04 01:03:51 +00:00
Sage Moore	ce4f5a29fb	Add Automatic Prefix Caching (#2762 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-03-02 00:50:01 -08:00
Yuan Tang	49d849b3ab	docs: Add tutorial on deploying vLLM model with KServe (#2586 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2024-03-01 11:04:14 -08:00
Ganesh Jagadeesan	a8683102cc	multi-lora documentation fix (#3064 )	2024-02-27 21:26:15 -08:00
Woosuk Kwon	8b430d7dea	[Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM (#3046 )	2024-02-26 20:23:50 -08:00
张大成	48a8f4a7fd	Support Orion model (#2539 ) Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-02-26 19:17:06 -08:00

1 2 3 4 5

248 Commits