squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
xwjiang2010	74d55c065b	[VLM][BugFix] Make sure that `multi_modal_kwargs` can broadcast properly with ring buffer. (#5905 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-06-28 07:29:13 +00:00
Woosuk Kwon	f136da15e1	[Hardware][TPU] Optimize KV cache swapping (#5878 )	2024-06-27 21:12:13 -07:00
Divakar Verma	c3dde367f1	[Kernel][ROCm][AMD] fused_moe Triton configs v2 for mi300X (#5932 )	2024-06-27 13:41:08 -07:00
youkaichao	64e8d2a783	[core][misc] remove logical block (#5882 )	2024-06-27 13:34:55 -07:00
Woosuk Kwon	79c92c7c8a	[Model] Add Gemma 2 (#5908 )	2024-06-27 13:33:56 -07:00
Roger Wang	736ed38849	[CI/Build] Fix Args for `_get_logits_warper` in Sampler Test (#5922 )	2024-06-27 11:43:04 -07:00
Nick Hill	365791ff81	[BugFix] Fix `min_tokens` behaviour for multiple eos tokens (#5849 )	2024-06-27 11:31:11 -07:00
Nick Hill	691e29ecf3	[BugFix] Fix `MLPSpeculator` handling of `num_speculative_tokens` (#5876 )	2024-06-27 10:59:33 -07:00
youkaichao	3fd02bda51	[doc][misc] add note for Kubernetes users (#5916 )	2024-06-27 10:07:07 -07:00
Cyrus Leung	98cf2ed678	[Model][Bugfix] Implicit model flags and reenable Phi-3-Vision (#5896 )	2024-06-27 09:08:10 -07:00
Cyrus Leung	e9d32d077d	[CI/Build] [1/3] Reorganize entrypoints tests (#5526 )	2024-06-27 12:43:17 +00:00
Roger Wang	2061f0b8a7	[Bugfix] Fix img_sizes Parsing in Phi3-Vision (#5888 )	2024-06-27 08:29:24 +00:00
Cyrus Leung	96354d6a29	[Model] Add base class for LoRA-supported models (#5018 )	2024-06-27 16:03:04 +08:00
xwjiang2010	d12af207d2	[VLM][Bugfix] Make sure that `multi_modal_kwargs` is broadcasted properly (#5880 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>	2024-06-27 15:15:24 +08:00
Cyrus Leung	6eabc6cb0e	[Doc] Add note about context length in Phi-3-Vision example (#5887 )	2024-06-26 23:20:01 -07:00
Nick Hill	2110557dab	[BugFix] Fix cuda graph for MLPSpeculator (#5875 ) Co-authored-by: Abhinav Goyal <abhinav.goyal@flipkart.com>	2024-06-27 04:12:10 +00:00
Roger Wang	b9e84259e9	[Misc] Add example for LLaVA-NeXT (#5879 )	2024-06-26 17:57:16 -07:00
youkaichao	294104c3f9	[doc] update usage of env var to avoid conflict (#5873 )	2024-06-26 17:57:12 -04:00
Chip Kerchner	38a1674abb	Support CPU inference with VSX PowerPC ISA (#5652 )	2024-06-26 21:53:04 +00:00
Woosuk Kwon	f5c8628fdc	[Bugfix][TPU] Fix CPU cache allocation (#5869 )	2024-06-26 13:42:40 -07:00
Woosuk Kwon	cbc53b6b8d	[Hardware][TPU] Support parallel sampling & Swapping (#5855 )	2024-06-26 11:07:49 -07:00
sasha0552	c54269d967	[Frontend] Add tokenize/detokenize endpoints (#5054 )	2024-06-26 16:54:22 +00:00
Luka Govedič	5bfd1bbc98	[Kernel] Adding bias epilogue support for `cutlass_scaled_mm` (#5560 ) Co-authored-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2024-06-26 15:16:00 +00:00
Cyrus Leung	6984c02a27	[CI/Build] Refactor image test assets (#5821 )	2024-06-26 01:02:34 -07:00
Woosuk Kwon	3439c5a8e3	[Bugfix][TPU] Fix KV cache size calculation (#5860 )	2024-06-26 00:58:23 -07:00
Woosuk Kwon	6806998bf9	[Bugfix] Fix embedding to support 2D inputs (#5829 )	2024-06-26 00:15:22 -07:00
youkaichao	515080ad2f	[bugfix][distributed] fix shm broadcast when the queue size is full (#5801 )	2024-06-25 21:56:02 -07:00
Roger Wang	3aa7b6cf66	[Misc][Doc] Add Example of using OpenAI Server with VLM (#5832 )	2024-06-25 20:34:25 -07:00
Stephanie Wang	dda4811591	[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408 ) Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu> Signed-off-by: Stephanie <swang@anyscale.com> Co-authored-by: Stephanie <swang@anyscale.com>	2024-06-25 20:30:03 -07:00
aws-patlange	82079729cc	[Bugfix] Fix assertion in NeuronExecutor (#5841 )	2024-06-25 19:52:10 -07:00
Thomas Parnell	c2a8ac75e0	[CI/Build] Add E2E tests for MLPSpeculator (#5791 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-06-26 00:04:08 +00:00
Woosuk Kwon	f178e56c68	[Hardware][TPU] Raise errors for unsupported sampling params (#5850 )	2024-06-25 16:58:23 -07:00
Matt Wong	dd793d1de5	[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422 )	2024-06-25 15:56:15 -07:00
Woosuk Kwon	bc34937d68	[Hardware][TPU] Refactor TPU backend (#5831 )	2024-06-25 15:25:52 -07:00
Dipika Sikka	dd248f7675	[Misc] Update `w4a16` `compressed-tensors` support to include `w8a16` (#5794 )	2024-06-25 19:23:35 +00:00
Michael Goin	d9b34baedd	[CI/Build] Add unit testing for FlexibleArgumentParser (#5798 )	2024-06-25 12:18:03 -07:00
youkaichao	c18ebfdd71	[doc][distributed] add both gloo and nccl tests (#5834 )	2024-06-25 15:10:28 -04:00
Antoni Baum	67882dbb44	[Core] Add fault tolerance for `RayTokenizerGroupPool` (#5748 )	2024-06-25 10:15:10 -07:00
Jie Fu (傅杰)	7b99314301	[Misc] Remove useless code in cpu_worker (#5824 )	2024-06-25 09:41:36 -07:00
Woo-Yeon Lee	2ce5d6688b	[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414 )	2024-06-25 09:56:06 +00:00
Cyrus Leung	f23871e9ee	[Doc] Add notice about breaking changes to VLMs (#5818 )	2024-06-25 01:25:03 -07:00
Kevin H. Luu	e9de9dd551	[ci] Remove aws template (#5757 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-06-24 21:09:02 -07:00
Chang Su	ba991d5c84	[Bugfix] Fix FlexibleArgumentParser replaces _ with - for actual args (#5795 )	2024-06-24 17:01:19 -06:00
Michael Goin	1744cc99ba	[Doc] Add Phi-3-medium to list of supported models (#5788 )	2024-06-24 10:48:55 -07:00
Michael Goin	e72dc6cb35	[Doc] Add "Suggest edit" button to doc pages (#5789 )	2024-06-24 10:26:17 -07:00
youkaichao	c246212952	[doc][faq] add warning to download models for every nodes (#5783 )	2024-06-24 15:37:42 +08:00
Isotr0py	edd5fe5fa2	[Bugfix] Add phi3v resize for dynamic shape and fix torchvision requirement (#5772 )	2024-06-24 12:11:53 +08:00
Murali Andoorveedu	5d4d90536f	[Distributed] Add send and recv helpers (#5719 )	2024-06-23 14:42:28 -07:00
Varun Sundar Rabindranath	6c916ac8a8	[BugFix] [Kernel] Add Cutlass2x fallback kernels (#5744 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-06-23 21:07:11 +00:00
youkaichao	832ea88fcb	[core][distributed] improve shared memory broadcast (#5754 )	2024-06-22 10:00:43 -07:00

1 2 3 4 5 ...

1834 Commits