squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Lily Liu	89feb4c84d	[SpecDec] Remove Batch Expansion (2/3) (#9298 )	2024-10-12 05:13:37 +00:00
sixgod	6cf1167c1a	[Model] Add GLM-4v support and meet vllm==0.6.2 (#9242 )	2024-10-11 17:36:13 +00:00
Tyler Michael Smith	7342a7d7f8	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
Jee Jee Li	36ea79079b	[Misc][LoRA] Support loading LoRA weights for target_modules in reg format (#9275 )	2024-10-11 12:31:21 +00:00
youkaichao	cbc2ef5529	[misc] hide best_of from engine (#9261 ) Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>	2024-10-10 21:30:44 -07:00
youkaichao	e4d652ea3e	[torch.compile] integration with compilation control (#9058 )	2024-10-10 12:39:36 -07:00
sroy745	f3a507f1d3	[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149 )	2024-10-10 14:17:17 +08:00
Lucas Wilkinson	a64e7b9407	[Bugfix] Machete garbage results for some models (large K dim) (#9212 )	2024-10-10 14:16:17 +08:00
Michael Goin	ce00231a8b	[Bugfix] Fix Weight Loading Multiple GPU Test - Large Models (#9213 )	2024-10-10 14:15:40 +08:00
Li, Jiang	ca77dd7a44	[Hardware][CPU] Support AWQ for CPU backend (#7515 )	2024-10-09 10:28:08 -06:00
youkaichao	c8627cd41b	[ci][test] use load dummy for testing (#9165 )	2024-10-09 00:38:40 -07:00
chenqianfzh	2f4117c38e	support bitsandbytes quantization with more models (#9148 )	2024-10-08 19:52:19 -06:00
bnellnm	bd37b9fbe2	[Bugfix] Try to handle older versions of pytorch (#9086 )	2024-10-08 14:28:12 -07:00
Daniele	9a94ca4a5d	[Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537 )	2024-10-08 09:38:40 -07:00
Alex Brooks	069d3bd8d0	[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-08 14:31:26 +00:00
Alex Brooks	a3691b6b5e	[Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-08 14:12:56 +00:00
Brendan Wong	8c746226c9	[Frontend] API support for beam search for MQLLMEngine (#9117 )	2024-10-08 05:51:43 +00:00
youkaichao	04c12f8157	[misc] update utils to support comparing multiple settings (#9140 )	2024-10-08 02:51:49 +00:00
Isotr0py	f19da64871	[Core] Refactor GGUF parameters packing and forwarding (#8859 )	2024-10-07 10:01:46 +00:00
Isotr0py	4f95ffee6f	[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089 )	2024-10-07 06:50:35 +00:00
Cyrus Leung	8c6de96ea1	[Model] Explicit interface for vLLM models and support OOT embedding models (#9108 )	2024-10-07 06:10:35 +00:00
youkaichao	18b296fdb2	[core] remove beam search from the core (#9105 )	2024-10-07 05:47:04 +00:00
Varun Sundar Rabindranath	cb3b2b9ba4	[Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-10-06 12:48:11 -07:00
Cyrus Leung	b22b798471	[Model] PP support for embedding models and update docs (#9090 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-10-06 16:35:27 +08:00
Brendan Wong	168cab6bbf	[Frontend] API support for beam search (#9087 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-05 23:39:03 -07:00
Andy Dai	5df1834895	[Bugfix] Fix order of arguments matters in config.yaml (#8960 )	2024-10-05 17:35:11 +00:00
Chen Zhang	cfadb9c687	[Bugfix] Deprecate registration of custom configs to huggingface (#9083 )	2024-10-05 21:56:40 +08:00
Xin Yang	15986f598c	[Model] Support Gemma2 embedding model (#9004 )	2024-10-05 06:57:05 +00:00
ElizaWszola	05d686432f	[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973 ) Co-authored-by: Dipika <dipikasikka1@gmail.com> Co-authored-by: Dipika Sikka <ds3822@columbia.edu>	2024-10-04 12:34:44 -06:00
Flávia Béo	0dcc8cbe5a	Adds truncate_prompt_tokens param for embeddings creation (#8999 ) Signed-off-by: Flavia Beo <flavia.beo@ibm.com>	2024-10-04 18:31:40 +00:00
Roger Wang	26aa325f4f	[Core][VLM] Test registration for OOT multimodal models (#8717 ) Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:38:25 -07:00
Prashant Gupta	9ade8bbc8d	[Model] add a bunch of supported lora modules for mixtral (#9008 ) Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>	2024-10-04 16:24:40 +00:00
Cyrus Leung	0e36fd4909	[Misc] Move registry to its own file (#9064 )	2024-10-04 10:01:37 +00:00
Murali Andoorveedu	0f6d7a9a34	[Models] Add remaining model PP support (#7168 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:56:58 +08:00
代君	3dbb215b38	[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405 )	2024-10-04 10:36:39 +08:00
sroy745	91add85ec4	Fix failing spec decode test (#9054 )	2024-10-03 23:07:29 +00:00
youkaichao	9aaf14c62e	[misc] add forward context for attention (#9029 )	2024-10-03 12:09:42 -07:00
xendo	63e39937f9	[Frontend] [Neuron] Parse literals out of override-neuron-config (#8959 ) Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>	2024-10-03 18:02:07 +00:00
Guillaume Calmettes	83caf35e08	[BugFix] Enforce Mistral ToolCall id constraint when using the Mistral tool call parser (#9020 )	2024-10-03 16:44:52 +08:00
Shawn Tan	19f0d25796	[Model] Adding Granite MoE. (#8206 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-03 09:33:57 +08:00
afeldman-nm	563649aafe	[Core] Combined support for multi-step scheduling, chunked prefill & prefix caching (#8804 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Andrew Feldman <afeld2012@gmail.com>	2024-10-02 07:52:20 +00:00
Lily Liu	1570203864	[Spec Decode] (1/2) Remove batch expansion (#8839 )	2024-10-01 16:04:42 -07:00
Isotr0py	bc4eb65b54	[Bugfix] Fix Fuyu tensor parallel inference (#8986 )	2024-10-01 17:51:41 +08:00
Joe Runde	062c89e7c9	[Frontend][Core] Move guided decoding params into sampling params (#8252 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-10-01 09:34:25 +08:00
Lily Liu	bce324487a	[CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. (#8975 )	2024-10-01 00:51:40 +00:00
Mor Zusman	f13a07b1f8	[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533 )	2024-09-29 17:35:58 -04:00
danieljannai21	6c9ba48fde	[Frontend] Added support for HF's new `continue_final_message` parameter (#8942 )	2024-09-29 17:59:47 +00:00
Jee Jee Li	3d49776bbb	[Model][LoRA]LoRA support added for MiniCPMV2.5 (#7199 )	2024-09-29 06:59:45 +00:00
Cyrus Leung	26a68d5d7e	[CI/Build] Add test decorator for minimum GPU memory (#8925 )	2024-09-29 02:50:51 +00:00
ElizaWszola	d081da0064	[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741 ) Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-09-28 18:19:40 -07:00

1 2 3 4 5 ...

876 Commits