squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Will Eaton	882a1ad0de	[Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>	2024-10-29 15:07:37 -07:00
Joe Runde	67bdf8e523	[Bugfix][Frontend] Guard against bad token ids (#9634 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-29 14:13:20 -07:00
Michael Goin	ab6f981671	[CI][Bugfix] Skip chameleon for transformers 4.46.1 (#9808 )	2024-10-29 11:12:43 -07:00
wangshuai09	622b7ab955	[Hardware] using current_platform.seed_everything (#9785 ) Signed-off-by: wangshuai09 <391746016@qq.com>	2024-10-29 14:47:44 +00:00
Zhong Qishuai	ef7865b4f9	[Frontend] re-enable multi-modality input in the new beam search implementation (#9427 ) Signed-off-by: Qishuai Ferdinandzhong@gmail.com	2024-10-29 11:49:47 +00:00
litianjian	5f8d8075f9	[Model][VLM] Add multi-video support for LLaVA-Onevision (#8905 ) Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-28 18:04:10 +00:00
youkaichao	32176fee73	[torch.compile] support moe models (#9632 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-10-27 21:58:04 -07:00
wangshuai09	4e2d95e372	[Hardware][ROCM] using current_platform.is_rocm (#9642 ) Signed-off-by: wangshuai09 <391746016@qq.com>	2024-10-28 04:07:00 +00:00
madt2709	34a9941620	[Bugfix] Fix load config when using bools (#9533 )	2024-10-27 13:46:41 -04:00
bnellnm	3cb07a36a2	[Misc] Upgrade to pytorch 2.5 (#9588 ) Signed-off-by: Bill Nell <bill@neuralmagic.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-10-27 09:44:24 +00:00
kakao-kevin-us	6650e6a930	[Model] Add classification Task with Qwen2ForSequenceClassification (#9704 ) Signed-off-by: Kevin-Yang <ykcha9@gmail.com> Co-authored-by: Kevin-Yang <ykcha9@gmail.com>	2024-10-26 17:53:35 +00:00
Vasiliy Alekseev	07e981fdf4	[Frontend] Bad words sampling parameter (#9717 ) Signed-off-by: Vasily Alexeev <alvasian@yandex.ru>	2024-10-26 16:29:38 +00:00
Mengqing Cao	5cbdccd151	[Hardware][openvino] is_openvino --> current_platform.is_openvino (#9716 )	2024-10-26 10:59:06 +00:00
Kevin H. Luu	9f7b4ba865	[ci/Build] Skip Chameleon for transformers 4.46.0 on broadcast test #9675 (#9676 )	2024-10-24 20:59:00 -07:00
Charlie Fu	59449095ab	[Performance][Kernel] Fused_moe Performance Improvement (#9384 ) Signed-off-by: charlifu <charlifu@amd.com>	2024-10-24 15:37:52 -07:00
Alex Brooks	722d46edb9	[Model] Compute Llava Next Max Tokens / Dummy Data From Gridpoints (#9650 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-24 10:42:24 -07:00
Cyrus Leung	c866e0079d	[CI/Build] Fix VLM test failures when using transformers v4.46 (#9666 )	2024-10-25 01:40:40 +08:00
Yongzao	d27cfbf791	[torch.compile] Adding torch compile annotations to some models (#9641 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-10-24 09:31:42 -07:00
Jee Jee Li	295a061fb3	[Kernel] add kernel for FATReLU (#9610 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-10-24 16:18:27 +08:00
Yongzao	8a02cd045a	[torch.compile] Adding torch compile annotations to some models (#9639 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-10-24 00:54:57 -07:00
youkaichao	4fdc581f9e	[core] simplify seq group code (#9569 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-10-24 00:16:44 -07:00
Cyrus Leung	836e8ef6ee	[Bugfix] Fix PP for ChatGLM and Molmo (#9422 )	2024-10-24 06:12:05 +00:00
Vinay R Damodaran	33bab41060	[Bugfix]: Make chat content text allow type content (#9358 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com>	2024-10-24 05:05:49 +00:00
Yunfei Chu	fc6c274626	[Model] Add Qwen2-Audio model support (#9248 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-23 17:54:22 +00:00
Alex Brooks	150b779081	[Frontend] Enable Online Multi-image Support for MLlama (#9393 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-10-23 17:28:57 +00:00
Alex Brooks	31a08f5bd2	[Model] Add min_pixels / max_pixels to Qwen2VL as mm_processor_kwargs (#9612 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-23 14:05:18 +00:00
Isotr0py	3ff57ebfca	[Model] Initialize Florence-2 language backbone support (#9555 )	2024-10-23 10:42:47 +00:00
Cyrus Leung	831540cf04	[Model] Support E5-V (#9576 )	2024-10-23 11:35:29 +08:00
yulei	b17046e298	[BugFix] Fix metrics error for --num-scheduler-steps > 1 (#8234 )	2024-10-22 15:43:03 -07:00
Ronen Schaffer	cd5601ac37	[BugFix] Prevent exporting duplicate OpenTelemetry spans (#9017 )	2024-10-22 11:11:53 -07:00
Isotr0py	bb392ea2d2	[Model][VLM] Initialize support for Mono-InternVL model (#9528 )	2024-10-22 16:01:46 +00:00
Jee Jee Li	a48e3ec052	[CI/Build][LoRA] Temporarily fix long context failure issue (#9579 )	2024-10-22 11:32:51 +00:00
wangshuai09	3ddbe25502	[Hardware][CPU] using current_platform.is_cpu (#9536 )	2024-10-22 00:50:43 -07:00
Wallas Henrique	c0292211ce	[CI/Build] Replaced some models on tests for smaller ones (#9570 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-22 04:52:14 +00:00
Cyrus Leung	f085995a7b	[CI/Build] Remove unnecessary `fork_new_process` (#9484 )	2024-10-21 19:47:29 -07:00
Travis Johnson	b729901139	[Bugfix]: serialize config by value for --trust-remote-code (#6751 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-10-21 19:46:24 -07:00
youkaichao	76a5e13270	[core] move parallel sampling out from vllm core (#9302 )	2024-10-22 00:31:44 +00:00
Joe Runde	ef7faad1b8	🐛 Fixup more test failures from memory profiling (#9563 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-21 17:10:56 -07:00
Wallas Henrique	711f3a7806	[Frontend] Don't log duplicate error stacktrace for every request in the batch (#9023 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-21 14:49:41 -07:00
Dhia Eddine Rhaiem	f6b97293aa	[Model] FalconMamba Support (#9325 )	2024-10-21 12:50:16 -04:00
Cyrus Leung	696b01af8f	[CI/Build] Split up decoder-only LM tests (#9488 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-20 21:27:50 -07:00
Chen Zhang	4fa3e33349	[Kernel] Support sliding window in flash attention backend (#9403 )	2024-10-20 10:57:52 -07:00
Chen Zhang	5b59fe0f08	[Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger (#9530 )	2024-10-20 00:05:02 +00:00
Yue Zhang	c5eea3c8ba	[Frontend] Support simpler image input format (#9478 )	2024-10-18 23:17:07 -07:00
Joe Runde	380e18639f	🐛 fix torch memory profiling (#9516 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-18 21:25:19 -04:00
sasha0552	337ed76671	[Bugfix] Fix offline mode when using `mistral_common` (#9457 )	2024-10-18 18:12:32 -07:00
Cody Yu	d11bf435a0	[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py (#9510 )	2024-10-18 14:30:55 -07:00
Cyrus Leung	051eaf6db3	[Model] Add user-configurable task for models that support both generation and embedding (#9424 )	2024-10-18 11:31:58 -07:00
tomeras91	d2b1bf55ec	[Frontend][Feature] Add jamba tool parser (#9154 )	2024-10-18 10:27:48 +00:00
Joe Runde	de4008e2ab	[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-17 22:47:27 -04:00
Robert Shaw	343f8e0905	Support `BERTModel` (first `encoder-only` embedding model) (#9056 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: laishzh <laishengzhang@gmail.com> Co-authored-by: Max de Bayser <maxdebayser@gmail.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-10-17 23:21:01 +00:00
bnellnm	eca2c5f7c0	[Bugfix] Fix support for dimension like integers and ScalarType (#9299 )	2024-10-17 19:08:34 +00:00
Luka Govedič	0f41fbe5a3	[torch.compile] Fine-grained CustomOp enabling mechanism (#9300 )	2024-10-17 18:36:37 +00:00
Kuntai Du	81ede99ca4	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 ) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).	2024-10-17 11:38:15 -05:00
Mor Zusman	fb60ae9b91	[Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189 )	2024-10-16 12:12:43 -04:00
Cyrus Leung	cee711fdbb	[Core] Rename input data types (#8688 )	2024-10-16 10:49:37 +00:00
Cyrus Leung	7abba39ee6	[Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303 )	2024-10-16 14:31:00 +08:00
Cyrus Leung	7e7eae338d	[Misc] Standardize RoPE handling for Qwen2-VL (#9250 )	2024-10-16 13:56:17 +08:00
Chang Su	ba30942240	[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-15 15:40:43 -07:00
Michael Goin	22f8a69549	[Misc] Directly use compressed-tensors for checkpoint definitions (#8909 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-15 15:40:25 -07:00
Nick Hill	e9d517f276	[BugFix] Fix chat API continuous usage stats (#9357 )	2024-10-14 23:19:48 -07:00
Xiang Xu	f0fe4fe86d	[Model] Make llama3.2 support multiple and interleaved images (#9095 )	2024-10-14 15:24:26 -07:00
Lily Liu	89feb4c84d	[SpecDec] Remove Batch Expansion (2/3) (#9298 )	2024-10-12 05:13:37 +00:00
sixgod	6cf1167c1a	[Model] Add GLM-4v support and meet vllm==0.6.2 (#9242 )	2024-10-11 17:36:13 +00:00
Tyler Michael Smith	7342a7d7f8	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
Jee Jee Li	36ea79079b	[Misc][LoRA] Support loading LoRA weights for target_modules in reg format (#9275 )	2024-10-11 12:31:21 +00:00
youkaichao	cbc2ef5529	[misc] hide best_of from engine (#9261 ) Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>	2024-10-10 21:30:44 -07:00
youkaichao	e4d652ea3e	[torch.compile] integration with compilation control (#9058 )	2024-10-10 12:39:36 -07:00
sroy745	f3a507f1d3	[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149 )	2024-10-10 14:17:17 +08:00
Lucas Wilkinson	a64e7b9407	[Bugfix] Machete garbage results for some models (large K dim) (#9212 )	2024-10-10 14:16:17 +08:00
Michael Goin	ce00231a8b	[Bugfix] Fix Weight Loading Multiple GPU Test - Large Models (#9213 )	2024-10-10 14:15:40 +08:00
Li, Jiang	ca77dd7a44	[Hardware][CPU] Support AWQ for CPU backend (#7515 )	2024-10-09 10:28:08 -06:00
youkaichao	c8627cd41b	[ci][test] use load dummy for testing (#9165 )	2024-10-09 00:38:40 -07:00
chenqianfzh	2f4117c38e	support bitsandbytes quantization with more models (#9148 )	2024-10-08 19:52:19 -06:00
bnellnm	bd37b9fbe2	[Bugfix] Try to handle older versions of pytorch (#9086 )	2024-10-08 14:28:12 -07:00
Daniele	9a94ca4a5d	[Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537 )	2024-10-08 09:38:40 -07:00
Alex Brooks	069d3bd8d0	[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-08 14:31:26 +00:00
Alex Brooks	a3691b6b5e	[Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-08 14:12:56 +00:00
Brendan Wong	8c746226c9	[Frontend] API support for beam search for MQLLMEngine (#9117 )	2024-10-08 05:51:43 +00:00
youkaichao	04c12f8157	[misc] update utils to support comparing multiple settings (#9140 )	2024-10-08 02:51:49 +00:00
Isotr0py	f19da64871	[Core] Refactor GGUF parameters packing and forwarding (#8859 )	2024-10-07 10:01:46 +00:00
Isotr0py	4f95ffee6f	[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089 )	2024-10-07 06:50:35 +00:00
Cyrus Leung	8c6de96ea1	[Model] Explicit interface for vLLM models and support OOT embedding models (#9108 )	2024-10-07 06:10:35 +00:00
youkaichao	18b296fdb2	[core] remove beam search from the core (#9105 )	2024-10-07 05:47:04 +00:00
Varun Sundar Rabindranath	cb3b2b9ba4	[Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-10-06 12:48:11 -07:00
Cyrus Leung	b22b798471	[Model] PP support for embedding models and update docs (#9090 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-10-06 16:35:27 +08:00
Brendan Wong	168cab6bbf	[Frontend] API support for beam search (#9087 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-05 23:39:03 -07:00
Andy Dai	5df1834895	[Bugfix] Fix order of arguments matters in config.yaml (#8960 )	2024-10-05 17:35:11 +00:00
Chen Zhang	cfadb9c687	[Bugfix] Deprecate registration of custom configs to huggingface (#9083 )	2024-10-05 21:56:40 +08:00
Xin Yang	15986f598c	[Model] Support Gemma2 embedding model (#9004 )	2024-10-05 06:57:05 +00:00
ElizaWszola	05d686432f	[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973 ) Co-authored-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu>	2024-10-04 12:34:44 -06:00
Flávia Béo	0dcc8cbe5a	Adds truncate_prompt_tokens param for embeddings creation (#8999 ) Signed-off-by: Flavia Beo <flavia.beo@ibm.com>	2024-10-04 18:31:40 +00:00
Roger Wang	26aa325f4f	[Core][VLM] Test registration for OOT multimodal models (#8717 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:38:25 -07:00
Prashant Gupta	9ade8bbc8d	[Model] add a bunch of supported lora modules for mixtral (#9008 ) Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>	2024-10-04 16:24:40 +00:00
Cyrus Leung	0e36fd4909	[Misc] Move registry to its own file (#9064 )	2024-10-04 10:01:37 +00:00
Murali Andoorveedu	0f6d7a9a34	[Models] Add remaining model PP support (#7168 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:56:58 +08:00
代君	3dbb215b38	[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405 )	2024-10-04 10:36:39 +08:00
sroy745	91add85ec4	Fix failing spec decode test (#9054 )	2024-10-03 23:07:29 +00:00
youkaichao	9aaf14c62e	[misc] add forward context for attention (#9029 )	2024-10-03 12:09:42 -07:00
xendo	63e39937f9	[Frontend] [Neuron] Parse literals out of override-neuron-config (#8959 ) Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>	2024-10-03 18:02:07 +00:00

1 2 3 4 5 ...

988 Commits