squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	051eaf6db3	[Model] Add user-configurable task for models that support both generation and embedding (#9424 )	2024-10-18 11:31:58 -07:00
Robert Shaw	343f8e0905	Support `BERTModel` (first `encoder-only` embedding model) (#9056 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: laishzh <laishengzhang@gmail.com> Co-authored-by: Max de Bayser <maxdebayser@gmail.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-10-17 23:21:01 +00:00
Mor Zusman	fb60ae9b91	[Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189 )	2024-10-16 12:12:43 -04:00
Cyrus Leung	cee711fdbb	[Core] Rename input data types (#8688 )	2024-10-16 10:49:37 +00:00
Cyrus Leung	7abba39ee6	[Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303 )	2024-10-16 14:31:00 +08:00
Xiang Xu	f0fe4fe86d	[Model] Make llama3.2 support multiple and interleaved images (#9095 )	2024-10-14 15:24:26 -07:00
sixgod	6cf1167c1a	[Model] Add GLM-4v support and meet vllm==0.6.2 (#9242 )	2024-10-11 17:36:13 +00:00
Tyler Michael Smith	7342a7d7f8	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
Isotr0py	f19da64871	[Core] Refactor GGUF parameters packing and forwarding (#8859 )	2024-10-07 10:01:46 +00:00
Isotr0py	4f95ffee6f	[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089 )	2024-10-07 06:50:35 +00:00
Cyrus Leung	8c6de96ea1	[Model] Explicit interface for vLLM models and support OOT embedding models (#9108 )	2024-10-07 06:10:35 +00:00
Chen Zhang	cfadb9c687	[Bugfix] Deprecate registration of custom configs to huggingface (#9083 )	2024-10-05 21:56:40 +08:00
Xin Yang	15986f598c	[Model] Support Gemma2 embedding model (#9004 )	2024-10-05 06:57:05 +00:00
Roger Wang	26aa325f4f	[Core][VLM] Test registration for OOT multimodal models (#8717 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:38:25 -07:00
Cyrus Leung	0e36fd4909	[Misc] Move registry to its own file (#9064 )	2024-10-04 10:01:37 +00:00
Murali Andoorveedu	0f6d7a9a34	[Models] Add remaining model PP support (#7168 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:56:58 +08:00
Shawn Tan	19f0d25796	[Model] Adding Granite MoE. (#8206 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-03 09:33:57 +08:00
Mor Zusman	f13a07b1f8	[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533 )	2024-09-29 17:35:58 -04:00
Cyrus Leung	26a68d5d7e	[CI/Build] Add test decorator for minimum GPU memory (#8925 )	2024-09-29 02:50:51 +00:00
Cyrus Leung	e1a3f5e831	[CI/Build] Update models tests & examples (#8874 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-28 09:54:35 -07:00
Isotr0py	6d792d2f31	[Bugfix][VLM] Fix Fuyu batching inference with `max_num_seqs>1` (#8892 )	2024-09-27 01:15:58 -07:00
Nick Hill	4b377d6feb	[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829 )	2024-09-26 16:46:43 -07:00
Chen Zhang	770ec6024f	[Model] Add support for the multi-modal Llama 3.2 model (#8811 ) Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-25 13:29:32 -07:00
Alex Brooks	8ff7ced996	[Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-24 07:36:46 +00:00
Alex Brooks	9b8c8ba119	[Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-09-23 07:44:48 +00:00
litianjian	5b59532760	[Model][VLM] Add LLaVA-Onevision model support (#8486 ) Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-22 10:51:44 -07:00
Patrick von Platen	b4e4eda92e	[Bugfix][Core] Fix tekken edge case for mistral tokenizer (#8640 )	2024-09-20 14:33:03 -07:00
afeldman-nm	a8c1d161a7	[Core] Prompt logprobs support in Multi-step (#8199 )	2024-09-18 08:38:43 -07:00
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
Patrick von Platen	a54ed80249	[Model] Add mistral function calling format to all models loaded with "mistral" format (#8515 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-17 17:50:37 +00:00
ywfang	8a0cf1ddc3	[Model] support minicpm3 (#8297 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-14 14:50:26 +00:00
Cyrus Leung	a84e598e21	[CI/Build] Reorganize models tests (#7820 )	2024-09-13 10:20:06 -07:00
Cyrus Leung	8427550488	[CI/Build] Update pixtral tests to use JSON (#8436 )	2024-09-13 03:47:52 +00:00
Patrick von Platen	d31174a4e1	[Hotfix][Pixtral] Fix multiple images bugs (#8415 )	2024-09-12 15:21:51 -07:00
Alex Brooks	c6202daeed	[Model] Support multiple images for qwen-vl (#8247 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-12 10:10:54 -07:00
Isotr0py	e56bf27741	[Bugfix] Fix InternVL2 inference with various num_patches (#8375 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-12 10:10:35 -07:00
Patrick von Platen	d394787e52	Pixtral (#8377 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-09-11 14:41:55 -07:00
bnellnm	73202dbe77	[Kernel][Misc] register ops to prevent graph breaks (#6917 ) Co-authored-by: Sage Moore <sage@neuralmagic.com>	2024-09-11 12:52:19 -07:00
Yang Fan	3b7fea770f	[Model][VLM] Add Qwen2-VL model support (#7905 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-11 09:31:19 -07:00
Yangshen⚡Deng	6a512a00df	[model] Support for Llava-Next-Video model (#7559 ) Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-10 22:21:36 -07:00
Pavani Majety	efcf946a15	[Hardware][NV] Add support for ModelOpt static scaling checkpoints. (#6112 )	2024-09-11 00:38:40 -04:00
Isotr0py	e807125936	[Model][VLM] Support multi-images inputs for InternVL2 models (#8201 )	2024-09-07 16:38:23 +08:00
Cyrus Leung	2f707fcb35	[Model] Multi-input support for LLaVA (#8238 )	2024-09-07 02:57:24 +00:00
Patrick von Platen	29f49cd6e3	[Model] Allow loading from original Mistral format (#8168 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-06 17:02:05 -06:00
Alex Brooks	9da25a88aa	[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-09-05 12:48:10 +00:00
Cody Yu	2ad2e5608e	[MISC] Consolidate FP8 kv-cache tests (#8131 )	2024-09-04 18:53:25 +00:00
Peter Salas	2be8ec6e71	[Model] Add Ultravox support for multiple audio chunks (#7963 )	2024-09-04 04:38:21 +00:00
Shawn Tan	f8d60145b4	[Model] Add Granite model (#7436 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-09-01 18:37:18 -07:00
Roger Wang	5b86b19954	[Misc] Optional installation of audio related packages (#8063 )	2024-09-01 14:46:57 -07:00
Pavani Majety	622f8abff8	[Bugfix] bugfix and add model test for flashinfer fp8 kv cache. (#8013 )	2024-08-30 22:18:50 -07:00

1 2 3 4

160 Commits