Mor Zusman
|
fb60ae9b91
|
[Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189)
|
2024-10-16 12:12:43 -04:00 |
|
Cyrus Leung
|
cee711fdbb
|
[Core] Rename input data types (#8688)
|
2024-10-16 10:49:37 +00:00 |
|
Cyrus Leung
|
7abba39ee6
|
[Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303)
|
2024-10-16 14:31:00 +08:00 |
|
Cyrus Leung
|
7e7eae338d
|
[Misc] Standardize RoPE handling for Qwen2-VL (#9250)
|
2024-10-16 13:56:17 +08:00 |
|
Chang Su
|
ba30942240
|
[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-15 15:40:43 -07:00 |
|
Michael Goin
|
22f8a69549
|
[Misc] Directly use compressed-tensors for checkpoint definitions (#8909)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-15 15:40:25 -07:00 |
|
Nick Hill
|
e9d517f276
|
[BugFix] Fix chat API continuous usage stats (#9357)
|
2024-10-14 23:19:48 -07:00 |
|
Xiang Xu
|
f0fe4fe86d
|
[Model] Make llama3.2 support multiple and interleaved images (#9095)
|
2024-10-14 15:24:26 -07:00 |
|
Lily Liu
|
89feb4c84d
|
[SpecDec] Remove Batch Expansion (2/3) (#9298)
|
2024-10-12 05:13:37 +00:00 |
|
sixgod
|
6cf1167c1a
|
[Model] Add GLM-4v support and meet vllm==0.6.2 (#9242)
|
2024-10-11 17:36:13 +00:00 |
|
Tyler Michael Smith
|
7342a7d7f8
|
[Model] Support Mamba (#6484)
|
2024-10-11 15:40:06 +00:00 |
|
Jee Jee Li
|
36ea79079b
|
[Misc][LoRA] Support loading LoRA weights for target_modules in reg format (#9275)
|
2024-10-11 12:31:21 +00:00 |
|
youkaichao
|
cbc2ef5529
|
[misc] hide best_of from engine (#9261)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
|
2024-10-10 21:30:44 -07:00 |
|
youkaichao
|
e4d652ea3e
|
[torch.compile] integration with compilation control (#9058)
|
2024-10-10 12:39:36 -07:00 |
|
sroy745
|
f3a507f1d3
|
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149)
|
2024-10-10 14:17:17 +08:00 |
|
Lucas Wilkinson
|
a64e7b9407
|
[Bugfix] Machete garbage results for some models (large K dim) (#9212)
|
2024-10-10 14:16:17 +08:00 |
|
Michael Goin
|
ce00231a8b
|
[Bugfix] Fix Weight Loading Multiple GPU Test - Large Models (#9213)
|
2024-10-10 14:15:40 +08:00 |
|
Li, Jiang
|
ca77dd7a44
|
[Hardware][CPU] Support AWQ for CPU backend (#7515)
|
2024-10-09 10:28:08 -06:00 |
|
youkaichao
|
c8627cd41b
|
[ci][test] use load dummy for testing (#9165)
|
2024-10-09 00:38:40 -07:00 |
|
chenqianfzh
|
2f4117c38e
|
support bitsandbytes quantization with more models (#9148)
|
2024-10-08 19:52:19 -06:00 |
|
bnellnm
|
bd37b9fbe2
|
[Bugfix] Try to handle older versions of pytorch (#9086)
|
2024-10-08 14:28:12 -07:00 |
|
Daniele
|
9a94ca4a5d
|
[Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537)
|
2024-10-08 09:38:40 -07:00 |
|
Alex Brooks
|
069d3bd8d0
|
[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-08 14:31:26 +00:00 |
|
Alex Brooks
|
a3691b6b5e
|
[Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-08 14:12:56 +00:00 |
|
Brendan Wong
|
8c746226c9
|
[Frontend] API support for beam search for MQLLMEngine (#9117)
|
2024-10-08 05:51:43 +00:00 |
|
youkaichao
|
04c12f8157
|
[misc] update utils to support comparing multiple settings (#9140)
|
2024-10-08 02:51:49 +00:00 |
|
Isotr0py
|
f19da64871
|
[Core] Refactor GGUF parameters packing and forwarding (#8859)
|
2024-10-07 10:01:46 +00:00 |
|
Isotr0py
|
4f95ffee6f
|
[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089)
|
2024-10-07 06:50:35 +00:00 |
|
Cyrus Leung
|
8c6de96ea1
|
[Model] Explicit interface for vLLM models and support OOT embedding models (#9108)
|
2024-10-07 06:10:35 +00:00 |
|
youkaichao
|
18b296fdb2
|
[core] remove beam search from the core (#9105)
|
2024-10-07 05:47:04 +00:00 |
|
Varun Sundar Rabindranath
|
cb3b2b9ba4
|
[Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-10-06 12:48:11 -07:00 |
|
Cyrus Leung
|
b22b798471
|
[Model] PP support for embedding models and update docs (#9090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-10-06 16:35:27 +08:00 |
|
Brendan Wong
|
168cab6bbf
|
[Frontend] API support for beam search (#9087)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-10-05 23:39:03 -07:00 |
|
Andy Dai
|
5df1834895
|
[Bugfix] Fix order of arguments matters in config.yaml (#8960)
|
2024-10-05 17:35:11 +00:00 |
|
Chen Zhang
|
cfadb9c687
|
[Bugfix] Deprecate registration of custom configs to huggingface (#9083)
|
2024-10-05 21:56:40 +08:00 |
|
Xin Yang
|
15986f598c
|
[Model] Support Gemma2 embedding model (#9004)
|
2024-10-05 06:57:05 +00:00 |
|
ElizaWszola
|
05d686432f
|
[Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973)
Co-authored-by: Dipika <dipikasikka1@gmail.com>
Co-authored-by: Dipika Sikka <ds3822@columbia.edu>
|
2024-10-04 12:34:44 -06:00 |
|
Flávia Béo
|
0dcc8cbe5a
|
Adds truncate_prompt_tokens param for embeddings creation (#8999)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
|
2024-10-04 18:31:40 +00:00 |
|
Roger Wang
|
26aa325f4f
|
[Core][VLM] Test registration for OOT multimodal models (#8717)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:38:25 -07:00 |
|
Prashant Gupta
|
9ade8bbc8d
|
[Model] add a bunch of supported lora modules for mixtral (#9008)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
|
2024-10-04 16:24:40 +00:00 |
|
Cyrus Leung
|
0e36fd4909
|
[Misc] Move registry to its own file (#9064)
|
2024-10-04 10:01:37 +00:00 |
|
Murali Andoorveedu
|
0f6d7a9a34
|
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:56:58 +08:00 |
|
代君
|
3dbb215b38
|
[Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405)
|
2024-10-04 10:36:39 +08:00 |
|
sroy745
|
91add85ec4
|
Fix failing spec decode test (#9054)
|
2024-10-03 23:07:29 +00:00 |
|
youkaichao
|
9aaf14c62e
|
[misc] add forward context for attention (#9029)
|
2024-10-03 12:09:42 -07:00 |
|
xendo
|
63e39937f9
|
[Frontend] [Neuron] Parse literals out of override-neuron-config (#8959)
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
|
2024-10-03 18:02:07 +00:00 |
|
Guillaume Calmettes
|
83caf35e08
|
[BugFix] Enforce Mistral ToolCall id constraint when using the Mistral tool call parser (#9020)
|
2024-10-03 16:44:52 +08:00 |
|
Shawn Tan
|
19f0d25796
|
[Model] Adding Granite MoE. (#8206)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-03 09:33:57 +08:00 |
|
afeldman-nm
|
563649aafe
|
[Core] Combined support for multi-step scheduling, chunked prefill & prefix caching (#8804)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
|
2024-10-02 07:52:20 +00:00 |
|
Lily Liu
|
1570203864
|
[Spec Decode] (1/2) Remove batch expansion (#8839)
|
2024-10-01 16:04:42 -07:00 |
|