squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Woosuk Kwon	395aa823ea	[Misc] Minor type annotation fix (#3716 )	2024-03-28 21:12:24 -07:00
SangBin Cho	26422e477b	[Test] Make model tests run again and remove --forked from pytest (#3631 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-03-28 21:06:40 -07:00
youkaichao	f342153b48	Revert "bump version to v0.4.0" (#3708 )	2024-03-28 18:49:42 -07:00
Simon Mo	27a57cad52	bump version to v0.4.0 (#3705 )	2024-03-28 18:26:51 -07:00
Yile (Michael) Gu	98a42e7078	[Benchmark] Change mii to use persistent deployment and support tensor parallel (#3628 )	2024-03-28 17:33:52 -07:00
youkaichao	0267fef52a	[Core] fix del of communicator (#3702 )	2024-03-29 00:24:58 +00:00
Simon Mo	4716a32dd4	fix logging msg for block manager (#3701 )	2024-03-28 23:29:55 +00:00
Woosuk Kwon	c0935c96d3	[Bugfix] Set enable_prefix_caching=True in prefix caching example (#3703 )	2024-03-28 16:26:30 -07:00
Woosuk Kwon	cb40b3ab6b	[Kernel] Add MoE Triton kernel configs for A100 40GB (#3700 )	2024-03-28 15:26:24 -07:00
Roy	515386ef3c	[Core] Support multi-node inference(eager and cuda graph) (#3686 )	2024-03-28 15:01:55 -07:00
Simon Mo	a4075cba4d	[CI] Add test case to run examples scripts (#3638 )	2024-03-28 14:36:10 -07:00
Simon Mo	96aa014d1e	fix benchmark format reporting in buildkite (#3693 )	2024-03-28 14:35:16 -07:00
Adam Boeglin	1715056fef	[Bugfix] Update neuron_executor.py to add optional vision_language_config (#3695 )	2024-03-28 10:43:34 -07:00
SangBin Cho	b51c1cc9d2	[2/N] Chunked prefill data update (#3538 )	2024-03-28 10:06:01 -07:00
Roger Wang	ce567a2926	[Kernel] DBRX Triton MoE kernel H100 (#3692 )	2024-03-28 10:05:34 -07:00
wenyujin333	d6ea427f04	[Model] Add support for Qwen2MoeModel (#3346 )	2024-03-28 15:19:59 +00:00
Cade Daniel	14ccd94c89	[Core][Bugfix]Refactor block manager for better testability (#3492 )	2024-03-27 23:59:28 -07:00
Woosuk Kwon	8267b06c30	[Kernel] Add Triton MoE kernel configs for DBRX on A100 (#3679 )	2024-03-27 22:22:25 -07:00
youkaichao	3492859b68	[CI/Build] update default number of jobs and nvcc threads to avoid overloading the system (#3675 )	2024-03-28 00:18:54 -04:00
hxer7963	098e1776ba	[Model] Add support for xverse (#3610 ) Co-authored-by: willhe <hexin@xverse.cn> Co-authored-by: root <root@localhost.localdomain>	2024-03-27 18:12:54 -07:00
Roy	10e6322283	[Model] Fix and clean commandr (#3671 )	2024-03-28 00:20:00 +00:00
Woosuk Kwon	6d9aa00fc4	[Docs] Add Command-R to supported models (#3669 )	2024-03-27 15:20:00 -07:00
zeppombal	1182607e18	Add support for Cohere's Command-R model (#3433 ) Co-authored-by: José Maria Pombal <jose.pombal@unbabel.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-03-27 14:19:32 -07:00
Roger Wang	45b6ef6513	feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )	2024-03-27 13:39:26 -07:00
AmadeusChan	1956931436	[Misc] add the "download-dir" option to the latency/throughput benchmarks (#3621 )	2024-03-27 13:39:05 -07:00
Megha Agarwal	e24336b5a7	[Model] Add support for DBRX (#3660 )	2024-03-27 13:01:46 -07:00
youkaichao	d18f4e73f3	[Bugfix] [Hotfix] fix nccl library name (#3661 )	2024-03-27 17:23:54 +00:00
Woosuk Kwon	82c540bebf	[Bugfix] More faithful implementation of Gemma (#3653 )	2024-03-27 09:37:18 -07:00
youkaichao	8f44facddd	[Core] remove cupy dependency (#3625 )	2024-03-27 00:33:26 -07:00
Woosuk Kwon	e66b629c04	[Misc] Minor fix in KVCache type (#3652 )	2024-03-26 23:14:06 -07:00
Jee Li	76879342a3	[Doc]add lora support (#3649 )	2024-03-27 02:06:46 +00:00
Jee Li	566b57c5c4	[Kernel] support non-zero cuda devices in punica kernels (#3636 )	2024-03-27 00:37:42 +00:00
Nick Hill	0dc72273b8	[BugFix] Fix ipv4 address parsing regression (#3645 )	2024-03-26 14:39:44 -07:00
liiliiliil	a979d9771e	[Bugfix] Fix ipv6 address parsing bug (#3641 )	2024-03-26 11:58:20 -07:00
Jee Li	8af890a865	Enable more models to inference based on LoRA (#3382 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-03-25 18:09:31 -07:00
Nick Hill	dfeb2ecc3a	[Misc] Include matched stop string/token in responses (#2976 ) Co-authored-by: Sahil Suneja <sahilsuneja@gmail.com>	2024-03-25 17:31:32 -07:00
Antoni Baum	3a243095e5	Optimize `_get_ranks` in Sampler (#3623 )	2024-03-25 16:03:02 -07:00
xwjiang2010	64172a976c	[Feature] Add vision language model support. (#3042 )	2024-03-25 14:16:30 -07:00
Simon Mo	f408d05c52	hotfix isort on logprobs ranks pr (#3622 )	2024-03-25 11:55:46 -07:00
Dylan Hawk	0b4997e05c	[Bugfix] API stream returning two stops (#3450 ) Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>	2024-03-25 10:14:34 -07:00
Travis Johnson	c13ad1b7bd	feat: implement the min_tokens sampling parameter (#3124 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-03-25 10:14:26 -07:00
Swapnil Parekh	819924e749	[Core] Adding token ranks along with logprobs (#3516 ) Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>	2024-03-25 10:13:10 -07:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
TianYu GUO	e67c295b0c	[Bugfix] fix automatic prefix args and add log info (#3608 )	2024-03-25 05:35:22 -07:00
Woosuk Kwon	925f3332ca	[Core] Refactor Attention Take 2 (#3462 )	2024-03-25 04:39:33 +00:00
少年	b0dfa91dd7	[Model] Add starcoder2 awq support (#3569 )	2024-03-24 21:07:36 -07:00
Woosuk Kwon	56a8652f33	[Bugfix] store lock file in tmp directory (#3578 )" (#3599 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-03-24 20:06:50 -07:00
Kunshang Ji	6d93d35308	[BugFix] tensor.get_device() -> tensor.device (#3604 )	2024-03-24 19:01:13 -07:00
youkaichao	837e185142	[CI/Build] fix flaky test (#3602 )	2024-03-24 17:43:05 -07:00
youkaichao	42bc386129	[CI/Build] respect the common environment variable MAX_JOBS (#3600 )	2024-03-24 17:04:00 -07:00

1 2 3 4 5 ...

1090 Commits