squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cade Daniel	11d652bd4f	[CI] Move CPU/AMD tests to after wait (#4123 )	2024-04-16 22:53:26 -07:00
Cade Daniel	d150e4f89f	[Misc] [CI] Fix CI failure caught after merge (#4126 )	2024-04-16 17:56:01 -07:00
Cade Daniel	e95cd87959	[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894 )	2024-04-16 13:09:21 -07:00
Antoni Baum	69e1d2fb69	[Core] Refactor model loading code (#4097 )	2024-04-16 11:34:39 -07:00
Noam Gat	05434764cd	LM Format Enforcer Guided Decoding Support (#3868 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-16 05:54:57 +00:00
SangBin Cho	4e7ee664e2	[Core] Fix engine-use-ray broken (#4105 )	2024-04-16 05:24:53 +00:00
SangBin Cho	37e84a403d	[Typing] Fix Sequence type GenericAlias only available after Python 3.9. (#4092 )	2024-04-15 14:47:31 -07:00
Ricky Xu	4695397dcf	[Bugfix] Fix ray workers profiling with nsight (#4095 )	2024-04-15 14:24:45 -07:00
Sanger Steel	d619ae2d19	[Doc] Add better clarity for tensorizer usage (#4090 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-15 13:28:25 -07:00
Nick Hill	eb46fbfda2	[Core] Simplifications to executor classes (#4071 )	2024-04-15 13:05:09 -07:00
Li, Jiang	0003e9154b	[Misc][Minor] Fix CPU block num log in CPUExecutor. (#4088 )	2024-04-15 08:35:55 -07:00
Zhuohan Li	e11e200736	[Bugfix] Fix filelock version requirement (#4075 )	2024-04-14 21:50:08 -07:00
Roy	8db1bf32f8	[Misc] Upgrade triton to 2.2.0 (#4061 )	2024-04-14 17:43:54 -07:00
Simon Mo	aceb17cf2d	[Docs] document that mixtral 8x22b is supported (#4073 )	2024-04-14 14:35:55 -07:00
Nick Hill	563c54f760	[BugFix] Fix tensorizer extra in setup.py (#4072 )	2024-04-14 14:12:42 -07:00
youkaichao	2cd6b4f362	[Core] avoid too many cuda context by caching p2p test (#4021 )	2024-04-13 23:40:21 -07:00
Sanger Steel	711a000255	[Frontend] [Core] feat: Add model loading using `tensorizer` (#3476 )	2024-04-13 17:13:01 -07:00
Jee Li	989ae2538d	[Kernel] Add punica dimension for Baichuan-13B (#4053 )	2024-04-13 07:55:05 -07:00
zspo	0a430b4ae2	[Bugfix] fix_small_bug_in_neuron_executor (#4051 )	2024-04-13 07:54:03 -07:00
zspo	ec8e3c695f	[Bugfix] fix_log_time_in_metrics (#4050 )	2024-04-13 07:52:36 -07:00
youkaichao	98afde19fc	[Core][Distributed] improve logging for init dist (#4042 )	2024-04-13 07:12:53 -07:00
Dylan Hawk	5c2e66e487	[Bugfix] More type hint fixes for py 3.8 (#4039 )	2024-04-12 21:07:04 -07:00
youkaichao	546e721168	[CI/Test] expand ruff and yapf for all supported python version (#4037 )	2024-04-13 01:43:37 +00:00
Jee Li	b8aacac31a	[Bugfix] Fix LoRA bug (#4032 )	2024-04-12 16:56:37 -07:00
Bellk17	d04973ad54	Fix triton compilation issue (#3984 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-12 16:41:26 -07:00
youkaichao	fbb9d9eef4	[Core] fix custom allreduce default value (#4040 )	2024-04-12 16:40:39 -07:00
SangBin Cho	09473ee41c	[mypy] Add mypy type annotation part 1 (#4006 )	2024-04-12 14:35:50 -07:00
Zhuohan Li	d4ec9ffb95	[Misc] Fix typo in scheduler.py (#4022 )	2024-04-12 13:56:04 -07:00
youkaichao	96b6a6d790	[Bugfix] fix type hint for py 3.8 (#4036 )	2024-04-12 19:35:44 +00:00
SangBin Cho	36729bac13	[Test] Test multiple attn backend for chunked prefill. (#4023 )	2024-04-12 09:56:57 -07:00
Cyrus Leung	7fd3949a0b	[Frontend][Core] Move `merge_async_iterators` to utils (#4026 )	2024-04-12 05:30:54 +00:00
Jee Li	1096717ae9	[Core] Support LoRA on quantized models (#4012 )	2024-04-11 21:02:44 -07:00
Michael Feil	c2b4a1bce9	[Doc] Add typing hints / mypy types cleanup (#3816 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-11 17:17:21 -07:00
Nick Hill	e46a60aa4c	[BugFix] Fix handling of stop strings and stop token ids (#3672 )	2024-04-11 15:34:12 -07:00
Antoni Baum	1e96c3341a	Add extra punica sizes to support bigger vocabs (#4015 )	2024-04-11 22:18:57 +00:00
Dylan Hawk	95e7d4a97c	Fix echo/logprob OpenAI completion bug (#3441 ) Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>	2024-04-11 22:15:50 +00:00
youkaichao	559eb852f8	[Core] init_distributed_environment align with init_process_group(#4014 ) [Core][Distributed] make init_distributed_environment compatible with init_process_group (#4014)	2024-04-11 14:00:48 -07:00
Antoni Baum	a10d3056da	[Core] Set `linear_weights` directly on the layer (#3977 )	2024-04-11 16:35:51 -04:00
bigPYJ1151	8afca50889	[Hardware][Intel] Isolate CPUModelRunner and ModelRunner for better maintenance (#3824 )	2024-04-11 11:56:49 -07:00
fuchen.ljl	08ccee1e83	punica fix-bgmv-kernel-640 (#4007 )	2024-04-11 08:59:26 -07:00
Roger Wang	c1dc547129	[Kernel] Fused MoE Config for Mixtral 8x22 (#4002 )	2024-04-11 07:50:00 -07:00
youkaichao	f3d0bf7589	[Doc][Installation] delete python setup.py develop (#3989 )	2024-04-11 03:33:02 +00:00
Kunshang Ji	e9da5a40c6	[Misc] Add indirection layer for custom ops (#3913 )	2024-04-10 20:26:07 -07:00
SangBin Cho	e42df7227d	[Test] Add xformer and flash attn tests (#3961 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-11 03:09:50 +00:00
youkaichao	caada5e50a	[Core][Model] torch.compile for layernorm in commandr (#3985 ) [Core][Model] Use torch.compile to accelerate layernorm in commandr (#3985)	2024-04-11 01:48:26 +00:00
SangBin Cho	67b4221a61	[Core][5/N] Fully working chunked prefill e2e (#3884 )	2024-04-10 17:56:48 -07:00
youkaichao	63e7176f26	[Core][Refactor] move parallel_utils into vllm/distributed (#3950 ) [WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)	2024-04-10 15:33:30 -07:00
Travis Johnson	934d3662f7	[Bugfix] handle hf_config with architectures == None (#3982 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-10 22:28:25 +00:00
Frαnçois	92cd2e2f21	[Doc] Fix getting stared to use publicly available model (#3963 )	2024-04-10 18:05:52 +00:00
Daniel E Marasco	e4c4072c94	[Bugfix] Remove key sorting for `guided_json` parameter in OpenAi compatible Server (#3945 )	2024-04-10 10:15:51 -07:00

... 2 3 4 5 6 ...

1267 Commits