Commit Graph

23 Commits

Author SHA1 Message Date
afeldman-nm
428dd1445e
[Core] Logprobs support in Multi-step (#7652) 2024-08-29 19:19:08 -07:00
Roger Wang
bbf55c4805
[VLM] Refactor MultiModalConfig initialization and profiling (#7530) 2024-08-17 13:30:55 -07:00
Cyrus Leung
4ddc4743d7
[Core] Consolidate GB constant and enable float GB arguments (#7416) 2024-08-12 14:14:14 -07:00
Li, Jiang
3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125) 2024-07-26 13:50:10 -07:00
Antoni Baum
7bd82002ae
[Core] Allow specifying custom Executor (#6557) 2024-07-20 01:25:06 +00:00
Swapnil Parekh
4d6ada947c
[CORE] Adding support for insertion of soft-tuned prompts (#4645)
Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2024-07-09 13:26:36 -07:00
xwjiang2010
d9e98f42e4
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-03 22:14:16 +00:00
rohithkrn
f5dda63eb5
[LoRA] Add support for pinning lora adapters in the LRU cache (#5603) 2024-06-21 15:42:46 -07:00
Cody Yu
bc8ad68455
[Misc][Refactor] Introduce ExecuteModelData (#4540) 2024-05-03 17:47:07 -07:00
youkaichao
5b8a7c1cb0
[Misc] centralize all usage of environment variables (#4548) 2024-05-02 11:13:25 -07:00
leiwen83
4bb53e2dde
[BugFix] fix num_lookahead_slots missing in async executor (#4165)
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
2024-04-30 10:12:59 -07:00
Nick Hill
ba4be44c32
[BugFix] Fix return type of executor execute_model methods (#4402) 2024-04-27 11:17:45 -07:00
SangBin Cho
a88081bf76
[CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
2024-04-26 00:16:58 -07:00
Tao He
077f0a2e8a
[Frontend] Enable support for CPU backend in AsyncLLMEngine. (#3993)
Signed-off-by: Tao He <sighingnow@gmail.com>
2024-04-22 09:19:51 +00:00
Isotr0py
296cdf8ac7
[Misc] Add vision language model support to CPU backend (#3968) 2024-04-22 00:44:16 -07:00
Cade Daniel
e95cd87959
[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894) 2024-04-16 13:09:21 -07:00
Antoni Baum
69e1d2fb69
[Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00
Nick Hill
eb46fbfda2
[Core] Simplifications to executor classes (#4071) 2024-04-15 13:05:09 -07:00
Li, Jiang
0003e9154b
[Misc][Minor] Fix CPU block num log in CPUExecutor. (#4088) 2024-04-15 08:35:55 -07:00
SangBin Cho
09473ee41c
[mypy] Add mypy type annotation part 1 (#4006) 2024-04-12 14:35:50 -07:00
bigPYJ1151
8afca50889
[Hardware][Intel] Isolate CPUModelRunner and ModelRunner for better maintenance (#3824) 2024-04-11 11:56:49 -07:00
Cade Daniel
e7c7067b45
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837) 2024-04-09 11:44:15 -07:00
bigPYJ1151
0e3f06fe9c
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
2024-04-01 22:07:30 -07:00