James Whedbee
|
e1bb2fd52d
|
[Bugfix] Support logprobs when using guided_json and other constrained decoding fields (#4149)
|
2024-04-18 21:12:55 +00:00 |
|
Simon Mo
|
705578ae14
|
[Docs] document that Meta Llama 3 is supported (#4175)
|
2024-04-18 10:55:48 -07:00 |
|
Michał Moskal
|
e8cc7967ff
|
[Bugfix][Kernel] allow non-power-of-two head sizes in prefix prefill (#4128)
|
2024-04-18 00:51:28 -07:00 |
|
Michael Goin
|
53b018edcb
|
[Bugfix] Get available quantization methods from quantization registry (#4098)
|
2024-04-18 00:21:55 -07:00 |
|
Harry Mellor
|
66ded03067
|
Allow model to be served under multiple names (#2894)
Co-authored-by: Alexandre Payot <alexandrep@graphcore.ai>
|
2024-04-18 00:16:26 -07:00 |
|
youkaichao
|
6dc1fc9cfe
|
[Core] nccl integrity check and test (#4155)
[Core] Add integrity check during initialization; add test for it (#4155)
|
2024-04-17 22:28:52 -07:00 |
|
SangBin Cho
|
533d2a1f39
|
[Typing] Mypy typing part 2 (#4043)
Co-authored-by: SangBin Cho <sangcho@sangcho-LT93GQWG9C.local>
|
2024-04-17 17:28:43 -07:00 |
|
Shoichi Uchinami
|
a53222544c
|
[Kernel] Add punica dimension for Swallow-MS-7B LoRA (#4134)
|
2024-04-17 10:02:45 -07:00 |
|
Elinx
|
fe3b5bbc23
|
[Bugfix] fix output parsing error for trtllm backend (#4137)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-04-17 11:07:23 +00:00 |
|
youkaichao
|
8438e0569e
|
[Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024)
[Core] replace narrow-usage RayWorkerVllm to general WorkerWrapper to reduce code duplication (#4024)
|
2024-04-17 08:34:33 +00:00 |
|
Cade Daniel
|
11d652bd4f
|
[CI] Move CPU/AMD tests to after wait (#4123)
|
2024-04-16 22:53:26 -07:00 |
|
Cade Daniel
|
d150e4f89f
|
[Misc] [CI] Fix CI failure caught after merge (#4126)
|
2024-04-16 17:56:01 -07:00 |
|
Cade Daniel
|
e95cd87959
|
[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894)
|
2024-04-16 13:09:21 -07:00 |
|
Antoni Baum
|
69e1d2fb69
|
[Core] Refactor model loading code (#4097)
|
2024-04-16 11:34:39 -07:00 |
|
Noam Gat
|
05434764cd
|
LM Format Enforcer Guided Decoding Support (#3868)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-04-16 05:54:57 +00:00 |
|
SangBin Cho
|
4e7ee664e2
|
[Core] Fix engine-use-ray broken (#4105)
|
2024-04-16 05:24:53 +00:00 |
|
SangBin Cho
|
37e84a403d
|
[Typing] Fix Sequence type GenericAlias only available after Python 3.9. (#4092)
|
2024-04-15 14:47:31 -07:00 |
|
Ricky Xu
|
4695397dcf
|
[Bugfix] Fix ray workers profiling with nsight (#4095)
|
2024-04-15 14:24:45 -07:00 |
|
Sanger Steel
|
d619ae2d19
|
[Doc] Add better clarity for tensorizer usage (#4090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-04-15 13:28:25 -07:00 |
|
Nick Hill
|
eb46fbfda2
|
[Core] Simplifications to executor classes (#4071)
|
2024-04-15 13:05:09 -07:00 |
|
Li, Jiang
|
0003e9154b
|
[Misc][Minor] Fix CPU block num log in CPUExecutor. (#4088)
|
2024-04-15 08:35:55 -07:00 |
|
Zhuohan Li
|
e11e200736
|
[Bugfix] Fix filelock version requirement (#4075)
|
2024-04-14 21:50:08 -07:00 |
|
Roy
|
8db1bf32f8
|
[Misc] Upgrade triton to 2.2.0 (#4061)
|
2024-04-14 17:43:54 -07:00 |
|
Simon Mo
|
aceb17cf2d
|
[Docs] document that mixtral 8x22b is supported (#4073)
|
2024-04-14 14:35:55 -07:00 |
|
Nick Hill
|
563c54f760
|
[BugFix] Fix tensorizer extra in setup.py (#4072)
|
2024-04-14 14:12:42 -07:00 |
|
youkaichao
|
2cd6b4f362
|
[Core] avoid too many cuda context by caching p2p test (#4021)
|
2024-04-13 23:40:21 -07:00 |
|
Sanger Steel
|
711a000255
|
[Frontend] [Core] feat: Add model loading using tensorizer (#3476)
|
2024-04-13 17:13:01 -07:00 |
|
Jee Li
|
989ae2538d
|
[Kernel] Add punica dimension for Baichuan-13B (#4053)
|
2024-04-13 07:55:05 -07:00 |
|
zspo
|
0a430b4ae2
|
[Bugfix] fix_small_bug_in_neuron_executor (#4051)
|
2024-04-13 07:54:03 -07:00 |
|
zspo
|
ec8e3c695f
|
[Bugfix] fix_log_time_in_metrics (#4050)
|
2024-04-13 07:52:36 -07:00 |
|
youkaichao
|
98afde19fc
|
[Core][Distributed] improve logging for init dist (#4042)
|
2024-04-13 07:12:53 -07:00 |
|
Dylan Hawk
|
5c2e66e487
|
[Bugfix] More type hint fixes for py 3.8 (#4039)
|
2024-04-12 21:07:04 -07:00 |
|
youkaichao
|
546e721168
|
[CI/Test] expand ruff and yapf for all supported python version (#4037)
|
2024-04-13 01:43:37 +00:00 |
|
Jee Li
|
b8aacac31a
|
[Bugfix] Fix LoRA bug (#4032)
|
2024-04-12 16:56:37 -07:00 |
|
Bellk17
|
d04973ad54
|
Fix triton compilation issue (#3984)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-04-12 16:41:26 -07:00 |
|
youkaichao
|
fbb9d9eef4
|
[Core] fix custom allreduce default value (#4040)
|
2024-04-12 16:40:39 -07:00 |
|
SangBin Cho
|
09473ee41c
|
[mypy] Add mypy type annotation part 1 (#4006)
|
2024-04-12 14:35:50 -07:00 |
|
Zhuohan Li
|
d4ec9ffb95
|
[Misc] Fix typo in scheduler.py (#4022)
|
2024-04-12 13:56:04 -07:00 |
|
youkaichao
|
96b6a6d790
|
[Bugfix] fix type hint for py 3.8 (#4036)
|
2024-04-12 19:35:44 +00:00 |
|
SangBin Cho
|
36729bac13
|
[Test] Test multiple attn backend for chunked prefill. (#4023)
|
2024-04-12 09:56:57 -07:00 |
|
Cyrus Leung
|
7fd3949a0b
|
[Frontend][Core] Move merge_async_iterators to utils (#4026)
|
2024-04-12 05:30:54 +00:00 |
|
Jee Li
|
1096717ae9
|
[Core] Support LoRA on quantized models (#4012)
|
2024-04-11 21:02:44 -07:00 |
|
Michael Feil
|
c2b4a1bce9
|
[Doc] Add typing hints / mypy types cleanup (#3816)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-04-11 17:17:21 -07:00 |
|
Nick Hill
|
e46a60aa4c
|
[BugFix] Fix handling of stop strings and stop token ids (#3672)
|
2024-04-11 15:34:12 -07:00 |
|
Antoni Baum
|
1e96c3341a
|
Add extra punica sizes to support bigger vocabs (#4015)
|
2024-04-11 22:18:57 +00:00 |
|
Dylan Hawk
|
95e7d4a97c
|
Fix echo/logprob OpenAI completion bug (#3441)
Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>
|
2024-04-11 22:15:50 +00:00 |
|
youkaichao
|
559eb852f8
|
[Core] init_distributed_environment align with init_process_group(#4014)
[Core][Distributed] make init_distributed_environment compatible with init_process_group (#4014)
|
2024-04-11 14:00:48 -07:00 |
|
Antoni Baum
|
a10d3056da
|
[Core] Set linear_weights directly on the layer (#3977)
|
2024-04-11 16:35:51 -04:00 |
|
bigPYJ1151
|
8afca50889
|
[Hardware][Intel] Isolate CPUModelRunner and ModelRunner for better maintenance (#3824)
|
2024-04-11 11:56:49 -07:00 |
|
fuchen.ljl
|
08ccee1e83
|
punica fix-bgmv-kernel-640 (#4007)
|
2024-04-11 08:59:26 -07:00 |
|