Harsha vardhan manoj Bikki
|
008cf886c9
|
[Neuron] Adding support for adding/ overriding neuron configuration a… (#8062)
Co-authored-by: Harsha Bikki <harbikh@amazon.com>
|
2024-09-04 16:33:43 -07:00 |
|
Antoni Baum
|
652c83b697
|
[Misc] Raise a more informative exception in add/remove_logger (#7750)
|
2024-09-03 12:28:25 -07:00 |
|
Alexander Matveev
|
6d646d08a2
|
[Core] Optimize Async + Multi-step (#8050)
|
2024-09-03 18:50:29 +00:00 |
|
Woosuk Kwon
|
0fbc6696c2
|
[Bugfix] Fix single output condition in output processor (#7881)
|
2024-09-02 20:35:42 -07:00 |
|
Isotr0py
|
4ca65a9763
|
[Core][Bugfix] Accept GGUF model without .gguf extension (#8056)
|
2024-09-02 08:43:26 -04:00 |
|
Robert Shaw
|
8423aef4c8
|
[BugFix][Core] Multistep Fix Crash on Request Cancellation (#8059)
|
2024-08-31 19:44:03 +00:00 |
|
Cyrus Leung
|
98cef6a227
|
[Core] Increase default max_num_batched_tokens for multimodal models (#8028)
|
2024-08-30 08:20:34 -07:00 |
|
afeldman-nm
|
428dd1445e
|
[Core] Logprobs support in Multi-step (#7652)
|
2024-08-29 19:19:08 -07:00 |
|
Cyrus Leung
|
4abed65c58
|
[VLM] Disallow overflowing max_model_len for multimodal models (#7998)
|
2024-08-29 17:49:04 -07:00 |
|
Alexander Matveev
|
3f60f2244e
|
[Core] Combine async postprocessor and multi-step (#7921)
|
2024-08-29 11:18:26 -07:00 |
|
Alexander Matveev
|
f508e03e7f
|
[Core] Async_output_proc: Add virtual engine support (towards pipeline parallel) (#7911)
|
2024-08-28 00:02:30 -07:00 |
|
Kunshang Ji
|
076169f603
|
[Hardware][Intel GPU] Add intel GPU pipeline parallel support. (#7810)
|
2024-08-27 10:07:02 -07:00 |
|
Patrick von Platen
|
6fc4e6e07a
|
[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739)
|
2024-08-27 12:40:02 +00:00 |
|
Megha Agarwal
|
2eedede875
|
[Core] Asynchronous Output Processor (#7049)
Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>
|
2024-08-26 20:53:20 -07:00 |
|
omrishiv
|
760e9f71a8
|
[Bugfix] neuron: enable tensor parallelism (#7562)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
|
2024-08-26 15:13:13 -07:00 |
|
Cyrus Leung
|
029c71de11
|
[CI/Build] Avoid downloading all HF files in RemoteOpenAIServer (#7836)
|
2024-08-26 05:31:10 +00:00 |
|
Alexander Matveev
|
9db93de20c
|
[Core] Add multi-step support to LLMEngine (#7789)
|
2024-08-23 12:45:53 -07:00 |
|
Maximilien de Bayser
|
e25fee57c2
|
[BugFix] Fix server crash on empty prompt (#7746)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-08-23 13:12:44 +00:00 |
|
Michael Goin
|
15310b5101
|
[Bugfix] Use LoadFormat values for vllm serve --load-format (#7784)
|
2024-08-22 11:37:08 -07:00 |
|
William Lin
|
dd53c4b023
|
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-21 15:39:26 -07:00 |
|
William Lin
|
91f4522cbf
|
[multi-step] Raise error if not using async engine (#7703)
|
2024-08-21 11:49:19 -07:00 |
|
Robert Shaw
|
f7e3b0c5aa
|
[Bugfix][Frontend] Fix Issues Under High Load With zeromq Frontend (#7394)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-21 13:34:14 -04:00 |
|
Nick Hill
|
c75363fbc0
|
[BugFix] Avoid premature async generator exit and raise all exception variations (#7698)
|
2024-08-21 11:45:55 -04:00 |
|
Cyrus Leung
|
baaedfdb2d
|
[mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
|
2024-08-20 23:28:21 -07:00 |
|
Travis Johnson
|
67e02fa8a4
|
[Bugfix] use StoreBoolean instead of type=bool for --disable-logprobs-during-spec-decoding (#7665)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-08-20 00:43:09 +00:00 |
|
William Lin
|
47b65a5508
|
[core] Multi Step Scheduling (#7000)
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
|
2024-08-19 13:52:13 -07:00 |
|
Ali Panahi
|
dad961ef5c
|
[Bugfix] fix lora_dtype value type in arg_utils.py - part 2 (#5428)
|
2024-08-19 20:47:00 +00:00 |
|
Cody Yu
|
3ac50b47d0
|
[MISC] Add prefix cache hit rate to metrics (#7606)
|
2024-08-19 11:52:07 -07:00 |
|
SangBin Cho
|
ff7ec82c4d
|
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109)
|
2024-08-18 17:57:20 -07:00 |
|
Robert Shaw
|
e3b318216d
|
[ Bugfix ] Fix Prometheus Metrics With zeromq Frontend (#7279)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-18 20:19:48 +00:00 |
|
Roger Wang
|
bbf55c4805
|
[VLM] Refactor MultiModalConfig initialization and profiling (#7530)
|
2024-08-17 13:30:55 -07:00 |
|
Mahesh Keralapura
|
93478b63d2
|
[Core] Fix tracking of model forward time in case of PP>1 (#7440)
[Core] Fix tracking of model forward time to the span traces in case of PP>1 (#7440)
|
2024-08-16 13:46:01 -07:00 |
|
shangmingc
|
b67ae00cdb
|
[Misc] Add quantization config support for speculative model. (#7343)
|
2024-08-15 19:34:28 -07:00 |
|
omrishiv
|
9c1f78d5d6
|
[Bugfix] update neuron for version > 0.5.0 (#7175)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-15 09:44:14 -07:00 |
|
William Lin
|
2ecf7b1757
|
[core] [3/N] multi-step args and sequence.py (#7452)
|
2024-08-14 12:32:45 -07:00 |
|
Cyrus Leung
|
3f674a49b5
|
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126)
|
2024-08-14 17:55:42 +00:00 |
|
Wallas Henrique
|
70b746efcf
|
[Misc] Deprecation Warning when setting --engine-use-ray (#7424)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-08-14 09:44:27 -07:00 |
|
youkaichao
|
16422ea76f
|
[misc][plugin] add plugin system implementation (#7426)
|
2024-08-13 16:24:17 -07:00 |
|
Rui Qiao
|
198d6a2898
|
[Core] Shut down aDAG workers with clean async llm engine exit (#7224)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-12 17:57:16 -07:00 |
|
Cyrus Leung
|
4ddc4743d7
|
[Core] Consolidate GB constant and enable float GB arguments (#7416)
|
2024-08-12 14:14:14 -07:00 |
|
Cyrus Leung
|
24154f8618
|
[Frontend] Disallow passing model as both argument and option (#7347)
|
2024-08-12 12:58:34 +00:00 |
|
Mahesh Keralapura
|
933790c209
|
[Core] Add span metrics for model_forward, scheduler and sampler time (#7089)
|
2024-08-09 13:55:13 -07:00 |
|
Nick Hill
|
b4e9528f95
|
[Core] Streamline stream termination in AsyncLLMEngine (#7336)
|
2024-08-09 07:06:36 +00:00 |
|
Cyrus Leung
|
7eb4a51c5f
|
[Core] Support serving encoder/decoder models (#7258)
|
2024-08-09 10:39:41 +08:00 |
|
Joe Runde
|
21b9c49aa3
|
[Frontend] Kill the server on engine death (#6594)
Signed-off-by: Joe Runde <joe@joerun.de>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-08 09:47:48 -07:00 |
|
Nick Hill
|
9a3f49ae07
|
[BugFix] Overhaul async request cancellation (#7111)
|
2024-08-07 13:21:41 +08:00 |
|
afeldman-nm
|
fd95e026e0
|
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-06 16:51:47 -04:00 |
|
Isotr0py
|
360bd67cf0
|
[Core] Support loading GGUF model (#5191)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-05 17:54:23 -06:00 |
|
Cade Daniel
|
82a1b1a82b
|
[Speculative decoding] Add periodic log with time spent in proposal/scoring/verification (#6963)
|
2024-08-05 08:46:44 +00:00 |
|
youkaichao
|
83c644fe7e
|
[core][misc] simply output processing with shortcut code path (#7117)
|
2024-08-04 00:22:19 -07:00 |
|