squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
bigPYJ1151	0e3f06fe9c	[Hardware][Intel] Add CPU inference backend (#3634 ) Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>	2024-04-01 22:07:30 -07:00
Cade Daniel	eb69d68804	[Misc] [CI/Build] Speed up block manager CPU-only unit tests ~10x by opting-out of GPU cleanup (#3783 )	2024-04-02 00:49:51 +00:00
Qubitium	7d4e1b85e7	[Misc] Add support for new autogptq checkpoint_format (#3689 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2024-04-01 19:32:01 -04:00
Cade Daniel	93deb0b38f	[Speculative decoding 4/9] Lookahead scheduling for speculative decoding (#3250 )	2024-04-01 22:55:24 +00:00
Roger Wang	ccb58b23e6	[Misc] Fix Benchmark TTFT Calculation for Chat Completions (#3768 )	2024-04-01 15:24:30 -07:00
Nick Hill	49782fcb76	[Misc] Some minor simplifications to detokenization logic (#3670 ) Some simplifications made for clarity. Also moves detokenization-related functions from tokenizer.py to detokenizer.py.	2024-04-01 13:22:06 -07:00
Woosuk Kwon	f03cc667a0	[Misc] Minor fixes in requirements.txt (#3769 )	2024-04-01 10:15:48 +00:00
Robert Shaw	563c1d7ec5	[CI/Build] Make Marlin Tests Green (#3753 )	2024-03-30 19:18:34 -07:00
youkaichao	9c82a1bec3	[Doc] Update installation doc (#3746 ) [Doc] Update installation doc for build from source and explain the dependency on torch/cuda version (#3746) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-03-30 16:34:38 -07:00
mawong-amd	b6d103542c	[Kernel] Layernorm performance optimization (#3662 )	2024-03-30 14:26:38 -07:00
Simon Mo	51c31bc10c	CMake build elf without PTX (#3739 )	2024-03-30 01:53:08 +00:00
bnellnm	3ad438c66f	Fix build when nvtools is missing (#3698 )	2024-03-29 18:52:39 -07:00
youkaichao	203d4f82ac	[Core][Bugfix] cache len of tokenizer (#3741 )	2024-03-29 18:46:39 -07:00
Nick Hill	991143cfcd	[BugFix] Use consistent logger everywhere (#3738 )	2024-03-29 23:26:44 +00:00
Simon Mo	8b2d3cbc1b	usage lib get version another way (#3735 )	2024-03-29 15:57:08 -07:00
Hongxia Yang	9765b5c406	[ROCm][Bugfix] Fixed several bugs related to rccl path and attention selector logic (#3699 )	2024-03-29 14:52:36 -07:00
Simon Mo	430530fc18	bump version to v0.4.0 (#3712 )	2024-03-29 12:28:33 -07:00
Roger Wang	97356f3c7e	[Bugfix] Command-R Max Model Length (#3727 )	2024-03-29 12:27:51 -07:00
Roy	f510395bbf	[BugFix][Frontend] Fix completion logprobs=0 error (#3731 )	2024-03-29 09:38:21 -07:00
Roy	6110c39dc8	[BugFix] Fix tokenizer out of vocab size (#3685 )	2024-03-29 08:18:59 -07:00
yhu422	d8658c8cc1	Usage Stats Collection (#2852 )	2024-03-28 22:16:12 -07:00
Simon Mo	7bc94a0fdd	add ccache to docker build image (#3704 )	2024-03-28 22:14:24 -07:00
youkaichao	756b30a5f3	[Core][Test] move local_rank to the last arg with default value(#3711 ) [Core][Test] move local_rank to the last arg with default value to keep api compatible (#3711)	2024-03-28 21:19:45 -07:00
Woosuk Kwon	395aa823ea	[Misc] Minor type annotation fix (#3716 )	2024-03-28 21:12:24 -07:00
SangBin Cho	26422e477b	[Test] Make model tests run again and remove --forked from pytest (#3631 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-03-28 21:06:40 -07:00
youkaichao	f342153b48	Revert "bump version to v0.4.0" (#3708 )	2024-03-28 18:49:42 -07:00
Simon Mo	27a57cad52	bump version to v0.4.0 (#3705 )	2024-03-28 18:26:51 -07:00
Yile (Michael) Gu	98a42e7078	[Benchmark] Change mii to use persistent deployment and support tensor parallel (#3628 )	2024-03-28 17:33:52 -07:00
youkaichao	0267fef52a	[Core] fix del of communicator (#3702 )	2024-03-29 00:24:58 +00:00
Simon Mo	4716a32dd4	fix logging msg for block manager (#3701 )	2024-03-28 23:29:55 +00:00
Woosuk Kwon	c0935c96d3	[Bugfix] Set enable_prefix_caching=True in prefix caching example (#3703 )	2024-03-28 16:26:30 -07:00
Woosuk Kwon	cb40b3ab6b	[Kernel] Add MoE Triton kernel configs for A100 40GB (#3700 )	2024-03-28 15:26:24 -07:00
Roy	515386ef3c	[Core] Support multi-node inference(eager and cuda graph) (#3686 )	2024-03-28 15:01:55 -07:00
Simon Mo	a4075cba4d	[CI] Add test case to run examples scripts (#3638 )	2024-03-28 14:36:10 -07:00
Simon Mo	96aa014d1e	fix benchmark format reporting in buildkite (#3693 )	2024-03-28 14:35:16 -07:00
Adam Boeglin	1715056fef	[Bugfix] Update neuron_executor.py to add optional vision_language_config (#3695 )	2024-03-28 10:43:34 -07:00
SangBin Cho	b51c1cc9d2	[2/N] Chunked prefill data update (#3538 )	2024-03-28 10:06:01 -07:00
Roger Wang	ce567a2926	[Kernel] DBRX Triton MoE kernel H100 (#3692 )	2024-03-28 10:05:34 -07:00
wenyujin333	d6ea427f04	[Model] Add support for Qwen2MoeModel (#3346 )	2024-03-28 15:19:59 +00:00
Cade Daniel	14ccd94c89	[Core][Bugfix]Refactor block manager for better testability (#3492 )	2024-03-27 23:59:28 -07:00
Woosuk Kwon	8267b06c30	[Kernel] Add Triton MoE kernel configs for DBRX on A100 (#3679 )	2024-03-27 22:22:25 -07:00
youkaichao	3492859b68	[CI/Build] update default number of jobs and nvcc threads to avoid overloading the system (#3675 )	2024-03-28 00:18:54 -04:00
hxer7963	098e1776ba	[Model] Add support for xverse (#3610 ) Co-authored-by: willhe <hexin@xverse.cn> Co-authored-by: root <root@localhost.localdomain>	2024-03-27 18:12:54 -07:00
Roy	10e6322283	[Model] Fix and clean commandr (#3671 )	2024-03-28 00:20:00 +00:00
Woosuk Kwon	6d9aa00fc4	[Docs] Add Command-R to supported models (#3669 )	2024-03-27 15:20:00 -07:00
zeppombal	1182607e18	Add support for Cohere's Command-R model (#3433 ) Co-authored-by: José Maria Pombal <jose.pombal@unbabel.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-03-27 14:19:32 -07:00
Roger Wang	45b6ef6513	feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )	2024-03-27 13:39:26 -07:00
AmadeusChan	1956931436	[Misc] add the "download-dir" option to the latency/throughput benchmarks (#3621 )	2024-03-27 13:39:05 -07:00
Megha Agarwal	e24336b5a7	[Model] Add support for DBRX (#3660 )	2024-03-27 13:01:46 -07:00
youkaichao	d18f4e73f3	[Bugfix] [Hotfix] fix nccl library name (#3661 )	2024-03-27 17:23:54 +00:00

1 2 3 4 5 ...

1013 Commits