Cody Yu
|
973617ae02
|
[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840)
Co-authored-by: Cade Daniel <edacih@gmail.com>
Co-authored-by: Cade Daniel <cade@anyscale.com>
|
2024-05-16 00:53:51 -07:00 |
|
Aurick Qiao
|
30e754390c
|
[Core] Implement sharded state loader (#4690)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-05-15 22:11:54 -07:00 |
|
Nick Hill
|
676a99982f
|
[Core] Add MultiprocessingGPUExecutor (#4539)
Co-authored-by: SAHIL SUNEJA <suneja@us.ibm.com>
|
2024-05-14 10:38:59 -07:00 |
|
Chang Su
|
e254497b66
|
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734)
|
2024-05-11 11:30:37 -07:00 |
|
Cody Yu
|
f942efb5a3
|
[Dynamic Spec Decoding] Auto-disable by the running queue size (#4592)
Co-authored-by: Cade Daniel <edacih@gmail.com>
|
2024-05-08 21:44:00 +00:00 |
|
leiwen83
|
8344f7742b
|
[Bug fix][Core] fixup ngram not setup correctly (#4551)
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
Co-authored-by: Cade Daniel <edacih@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-05-07 11:40:18 -07:00 |
|
Cody Yu
|
bc8ad68455
|
[Misc][Refactor] Introduce ExecuteModelData (#4540)
|
2024-05-03 17:47:07 -07:00 |
|
youkaichao
|
5b8a7c1cb0
|
[Misc] centralize all usage of environment variables (#4548)
|
2024-05-02 11:13:25 -07:00 |
|
Nick Hill
|
a657bfc48a
|
[Core] Add multiproc_worker_utils for multiprocessing-based workers (#4357)
|
2024-05-01 18:41:59 +00:00 |
|
leiwen83
|
b38e42fbca
|
[Speculative decoding] Add ngram prompt lookup decoding (#4237)
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
|
2024-05-01 11:13:03 -07:00 |
|
Nick Hill
|
2e240c69a9
|
[Core] Centralize GPU Worker construction (#4419)
|
2024-05-01 01:06:34 +00:00 |
|
leiwen83
|
4bb53e2dde
|
[BugFix] fix num_lookahead_slots missing in async executor (#4165)
Co-authored-by: Lei Wen <wenlei03@qiyi.com>
|
2024-04-30 10:12:59 -07:00 |
|
Nick Hill
|
ba4be44c32
|
[BugFix] Fix return type of executor execute_model methods (#4402)
|
2024-04-27 11:17:45 -07:00 |
|
Nick Hill
|
258a2c58d0
|
[Core] Introduce DistributedGPUExecutor abstract class (#4348)
|
2024-04-27 04:14:26 +00:00 |
|
SangBin Cho
|
a88081bf76
|
[CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
|
2024-04-26 00:16:58 -07:00 |
|
Nick Hill
|
15e7c675b0
|
[Core] Add shutdown() method to ExecutorBase (#4349)
|
2024-04-25 16:32:48 -07:00 |
|
Nick Hill
|
479d69fad0
|
[Core] Move ray_utils.py from engine to executor package (#4347)
|
2024-04-25 06:52:22 +00:00 |
|
DefTruth
|
d87f39e9a9
|
[Bugfix] Add init_cached_hf_modules to RayWorkerWrapper (#4286)
|
2024-04-23 09:28:35 -07:00 |
|
Cade Daniel
|
62b8aebc6f
|
[Speculative decoding 7/9] Speculative decoding end-to-end correctness tests. (#3951)
|
2024-04-23 08:02:36 +00:00 |
|
Nick Hill
|
8f2ea22bde
|
[Core] Some simplification of WorkerWrapper changes (#4183)
|
2024-04-23 07:49:08 +00:00 |
|
Tao He
|
077f0a2e8a
|
[Frontend] Enable support for CPU backend in AsyncLLMEngine. (#3993)
Signed-off-by: Tao He <sighingnow@gmail.com>
|
2024-04-22 09:19:51 +00:00 |
|
Isotr0py
|
296cdf8ac7
|
[Misc] Add vision language model support to CPU backend (#3968)
|
2024-04-22 00:44:16 -07:00 |
|
youkaichao
|
8a7a3e4436
|
[Core] add an option to log every function call to for debugging hang/crash in distributed inference (#4079)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-04-18 16:15:12 -07:00 |
|
Liangfu Chen
|
cd2f63fb36
|
[CI/CD] add neuron docker and ci test scripts (#3571)
|
2024-04-18 15:26:01 -07:00 |
|
youkaichao
|
8438e0569e
|
[Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024)
[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (#4024)
|
2024-04-17 08:34:33 +00:00 |
|
Cade Daniel
|
d150e4f89f
|
[Misc] [CI] Fix CI failure caught after merge (#4126)
|
2024-04-16 17:56:01 -07:00 |
|
Cade Daniel
|
e95cd87959
|
[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894)
|
2024-04-16 13:09:21 -07:00 |
|
Antoni Baum
|
69e1d2fb69
|
[Core] Refactor model loading code (#4097)
|
2024-04-16 11:34:39 -07:00 |
|
Ricky Xu
|
4695397dcf
|
[Bugfix] Fix ray workers profiling with nsight (#4095)
|
2024-04-15 14:24:45 -07:00 |
|
Nick Hill
|
eb46fbfda2
|
[Core] Simplifications to executor classes (#4071)
|
2024-04-15 13:05:09 -07:00 |
|
Li, Jiang
|
0003e9154b
|
[Misc][Minor] Fix CPU block num log in CPUExecutor. (#4088)
|
2024-04-15 08:35:55 -07:00 |
|
Sanger Steel
|
711a000255
|
[Frontend] [Core] feat: Add model loading using tensorizer (#3476)
|
2024-04-13 17:13:01 -07:00 |
|
zspo
|
0a430b4ae2
|
[Bugfix] fix_small_bug_in_neuron_executor (#4051)
|
2024-04-13 07:54:03 -07:00 |
|
Dylan Hawk
|
5c2e66e487
|
[Bugfix] More type hint fixes for py 3.8 (#4039)
|
2024-04-12 21:07:04 -07:00 |
|
SangBin Cho
|
09473ee41c
|
[mypy] Add mypy type annotation part 1 (#4006)
|
2024-04-12 14:35:50 -07:00 |
|
youkaichao
|
96b6a6d790
|
[Bugfix] fix type hint for py 3.8 (#4036)
|
2024-04-12 19:35:44 +00:00 |
|
bigPYJ1151
|
8afca50889
|
[Hardware][Intel] Isolate CPUModelRunner and ModelRunner for better maintenance (#3824)
|
2024-04-11 11:56:49 -07:00 |
|
Cade Daniel
|
e7c7067b45
|
[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837)
|
2024-04-09 11:44:15 -07:00 |
|
Isotr0py
|
0ce0539d47
|
[Bugfix] Fix Llava inference with Tensor Parallelism. (#3883)
|
2024-04-07 22:54:13 +08:00 |
|
Cade Daniel
|
5757d90e26
|
[Speculative decoding] Adding configuration object for speculative decoding (#3706)
Co-authored-by: Lily Liu <lilyliupku@gmail.com>
|
2024-04-03 00:40:57 +00:00 |
|
bigPYJ1151
|
0e3f06fe9c
|
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-04-01 22:07:30 -07:00 |
|
Roy
|
515386ef3c
|
[Core] Support multi-node inference(eager and cuda graph) (#3686)
|
2024-03-28 15:01:55 -07:00 |
|
Adam Boeglin
|
1715056fef
|
[Bugfix] Update neuron_executor.py to add optional vision_language_config (#3695)
|
2024-03-28 10:43:34 -07:00 |
|
Cade Daniel
|
14ccd94c89
|
[Core][Bugfix]Refactor block manager for better testability (#3492)
|
2024-03-27 23:59:28 -07:00 |
|
youkaichao
|
8f44facddd
|
[Core] remove cupy dependency (#3625)
|
2024-03-27 00:33:26 -07:00 |
|
xwjiang2010
|
64172a976c
|
[Feature] Add vision language model support. (#3042)
|
2024-03-25 14:16:30 -07:00 |
|
SangBin Cho
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
Zhuohan Li
|
e90fc21f2e
|
[Hardware][Neuron] Refactor neuron support (#3471)
|
2024-03-22 01:22:17 +00:00 |
|
Zhuohan Li
|
eeab52a4ff
|
[FIX] Simpler fix for async engine running on ray (#3371)
|
2024-03-13 14:18:40 -07:00 |
|
Zhuohan Li
|
4c922709b6
|
Add distributed model executor abstraction (#3191)
|
2024-03-11 11:03:45 -07:00 |
|