squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Russell Bryant	c5d7fb9ddc	[Doc] fix third-party model example (#9771 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-28 19:39:21 -07:00
youkaichao	76ed5340f0	[torch.compile] add deepseek v2 compile (#9775 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-10-28 14:35:17 -07:00
youkaichao	97b61bfae6	[misc] avoid circular import (#9765 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-10-28 20:51:23 +00:00
Yongzao	aa0addb397	Adding "torch compile" annotations to moe models (#9758 )	2024-10-28 13:49:56 -07:00
litianjian	5f8d8075f9	[Model][VLM] Add multi-video support for LLaVA-Onevision (#8905 ) Co-authored-by: litianjian <litianjian@bytedance.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-28 18:04:10 +00:00
Russell Bryant	8b0e4f2ad7	[CI/Build] Adopt Mergify for auto-labeling PRs (#9259 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-28 09:38:09 -07:00
Yan Ma	2adb4409e0	[Bugfix] Fix ray instance detect issue (#9439 )	2024-10-28 07:13:03 +00:00
Robert Shaw	feb92fbe4a	Fix beam search eos (#9627 )	2024-10-28 06:59:37 +00:00
youkaichao	32176fee73	[torch.compile] support moe models (#9632 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-10-27 21:58:04 -07:00
wangshuai09	4e2d95e372	[Hardware][ROCM] using current_platform.is_rocm (#9642 ) Signed-off-by: wangshuai09 <391746016@qq.com>	2024-10-28 04:07:00 +00:00
madt2709	34a9941620	[Bugfix] Fix load config when using bools (#9533 )	2024-10-27 13:46:41 -04:00
Harry Mellor	e130c40e4e	Fix cache management in "Close inactive issues and PRs" actions workflow (#9734 )	2024-10-27 10:30:03 -07:00
bnellnm	3cb07a36a2	[Misc] Upgrade to pytorch 2.5 (#9588 ) Signed-off-by: Bill Nell <bill@neuralmagic.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-10-27 09:44:24 +00:00
youkaichao	8549c82660	[core] cudagraph output with tensor weak reference (#9724 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-10-27 00:19:28 -07:00
科英	67a6882da4	[Misc] SpecDecodeWorker supports profiling (#9719 ) Signed-off-by: Abatom <abatom@163.com>	2024-10-27 04:18:03 +00:00
kakao-kevin-us	6650e6a930	[Model] Add classification Task with Qwen2ForSequenceClassification (#9704 ) Signed-off-by: Kevin-Yang <ykcha9@gmail.com> Co-authored-by: Kevin-Yang <ykcha9@gmail.com>	2024-10-26 17:53:35 +00:00
Vasiliy Alekseev	07e981fdf4	[Frontend] Bad words sampling parameter (#9717 ) Signed-off-by: Vasily Alexeev <alvasian@yandex.ru>	2024-10-26 16:29:38 +00:00
ErkinSagiroglu	55137e8ee3	Fix: MI100 Support By Bypassing Custom Paged Attention (#9560 )	2024-10-26 12:12:57 +00:00
Mengqing Cao	5cbdccd151	[Hardware][openvino] is_openvino --> current_platform.is_openvino (#9716 )	2024-10-26 10:59:06 +00:00
Sam Stoelinga	067e77f9a8	[Bugfix] Steaming continuous_usage_stats default to False (#9709 ) Signed-off-by: Sam Stoelinga <sammiestoel@gmail.com>	2024-10-26 05:05:47 +00:00
Travis Johnson	6567e13724	[Bugfix] Fix crash with llama 3.2 vision models and guided decoding (#9631 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: pavlo-ruban <pavlo.ruban@servicenow.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-25 15:42:56 -07:00
Rafael Vasquez	228cfbd03f	[Doc] Improve quickstart documentation (#9256 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-10-25 14:32:10 -07:00
Michael Goin	ca0d92227e	[Bugfix] Fix compressed_tensors_moe bad config.strategy (#9677 )	2024-10-25 12:40:33 -07:00
Woosuk Kwon	9645b9f646	[V1] Support sliding window attention (#9679 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-10-24 22:20:37 -07:00
Will Johnson	a6f3721861	[Model] add a lora module for granite 3.0 MoE models (#9673 )	2024-10-24 22:00:17 -07:00
Kevin H. Luu	9f7b4ba865	[ci/Build] Skip Chameleon for transformers 4.46.0 on broadcast test #9675 (#9676 )	2024-10-24 20:59:00 -07:00
Michael Goin	c91ed47c43	[Bugfix] Remove xformers requirement for Pixtral (#9597 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-10-24 15:38:05 -07:00
Charlie Fu	59449095ab	[Performance][Kernel] Fused_moe Performance Improvement (#9384 ) Signed-off-by: charlifu <charlifu@amd.com>	2024-10-24 15:37:52 -07:00
Michael Goin	e26d37a185	[Log][Bugfix] Fix default value check for `image_url.detail` (#9663 )	2024-10-24 10:44:38 -07:00
Alex Brooks	722d46edb9	[Model] Compute Llava Next Max Tokens / Dummy Data From Gridpoints (#9650 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-24 10:42:24 -07:00
Cyrus Leung	c866e0079d	[CI/Build] Fix VLM test failures when using transformers v4.46 (#9666 )	2024-10-25 01:40:40 +08:00
Yongzao	d27cfbf791	[torch.compile] Adding torch compile annotations to some models (#9641 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-10-24 09:31:42 -07:00
Harry Mellor	de662d32b5	Increase operation per run limit for "Close inactive issues and PRs" workflow (#9661 ) Signed-off-by: Harry Mellor <hej.mellor@gmail.com>	2024-10-24 12:17:45 -04:00
litianjian	f58454968f	[Bugfix]Disable the post_norm layer of the vision encoder for LLaVA models (#9653 )	2024-10-24 07:52:07 -07:00
Cyrus Leung	b979143d5b	[Doc] Move additional tips/notes to the top (#9647 )	2024-10-24 09:43:59 +00:00
Yongzao	ad6f78053e	[torch.compile] expanding support and fix allgather compilation (#9637 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-10-24 01:32:15 -07:00
Jee Jee Li	295a061fb3	[Kernel] add kernel for FATReLU (#9610 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-10-24 16:18:27 +08:00
Yongzao	8a02cd045a	[torch.compile] Adding torch compile annotations to some models (#9639 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-10-24 00:54:57 -07:00
youkaichao	4fdc581f9e	[core] simplify seq group code (#9569 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-10-24 00:16:44 -07:00
Woosuk Kwon	3770071eb4	[V1][Bugfix] Clean up requests when aborted (#9629 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-10-23 23:33:22 -07:00
Cyrus Leung	836e8ef6ee	[Bugfix] Fix PP for ChatGLM and Molmo (#9422 )	2024-10-24 06:12:05 +00:00
Yan Ma	056a68c7db	[XPU] avoid triton import for xpu (#9440 ) Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-10-24 05:14:00 +00:00
Vinay R Damodaran	33bab41060	[Bugfix]: Make chat content text allow type content (#9358 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com>	2024-10-24 05:05:49 +00:00
Michael Goin	b7df53cd42	[Bugfix] Use "vision_model" prefix for MllamaVisionModel (#9628 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-10-24 10:07:44 +08:00
Michael Goin	bb01f2915e	[Bugfix][Model] Fix Mllama SDPA illegal memory access for batched multi-image (#9626 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-10-24 10:03:44 +08:00
Russell Bryant	b548d7a5f4	[CI/Build] Add bot to close stale issues and PRs (#9436 )	2024-10-23 15:45:26 -07:00
Yunfei Chu	fc6c274626	[Model] Add Qwen2-Audio model support (#9248 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-23 17:54:22 +00:00
Alex Brooks	150b779081	[Frontend] Enable Online Multi-image Support for MLlama (#9393 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-10-23 17:28:57 +00:00
Yongzao	9013e24f7b	[torch.compile] Adding torch compile annotations to some models (#9614 )	2024-10-23 10:07:48 -07:00
Michael Goin	fd0e2cfdb2	[Misc] Separate total and output tokens in benchmark_throughput.py (#8914 )	2024-10-23 16:47:20 +00:00

... 3 4 5 6 7 ...

3355 Commits