Swapnil Parekh
4d6ada947c
[CORE] Adding support for insertion of soft-tuned prompts ( #4645 )
...
Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
Co-authored-by: Joe G <joseph.granados@h2o.ai>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2024-07-09 13:26:36 -07:00
Kevin H. Luu
a0550cbc80
Add support for multi-node on CI ( #5955 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-09 12:56:56 -07:00
Woosuk Kwon
08c5bdecae
[Bugfix][TPU] Fix outlines installation in TPU Dockerfile ( #6256 )
2024-07-09 02:56:06 -07:00
Woosuk Kwon
5d5b4c5fe5
[Bugfix][TPU] Add missing None to model input ( #6245 )
2024-07-09 00:21:37 -07:00
youkaichao
70c232f85a
[core][distributed] fix ray worker rank assignment ( #6235 )
2024-07-08 21:31:44 -07:00
youkaichao
a3c9435d93
[hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_device_capability ( #6216 )
2024-07-08 20:02:15 -07:00
Simon Mo
4f0e0ea131
Add FlashInfer to default Dockerfile ( #6172 )
2024-07-08 13:38:03 -07:00
tomeras91
ddc369fba1
[Bugfix] Mamba cache Cuda Graph padding ( #6214 )
2024-07-08 11:25:51 -07:00
Eric
185ad31f37
[Bugfix] use diskcache in outlines _get_guide #5436 ( #6203 )
2024-07-08 11:23:24 -07:00
afeldman-nm
543aa48573
[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) ( #4888 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-07-08 17:12:15 +00:00
Avshalom Manevich
f7a8fa39d8
[Kernel] reloading fused_moe config on the last chunk ( #6210 )
2024-07-08 08:00:38 -07:00
Haichuan
717f4bcea0
Feature/add benchmark testing ( #5947 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-08 07:52:06 +00:00
kczimm
16620f439d
do not exclude object field in CompletionStreamResponse ( #6196 )
2024-07-08 10:32:57 +08:00
youkaichao
3b08fe2b13
[misc][frontend] log all available endpoints ( #6195 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-07-07 15:11:12 -07:00
Robert Shaw
abfe705a02
[ Misc ] Support Fp8 via llm-compressor ( #6110 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
2024-07-07 20:42:11 +00:00
Haichuan
333306a252
add benchmark for fix length input and output ( #5857 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-07 07:42:13 +00:00
Roger Wang
6206dcb29e
[Model] Add PaliGemma ( #5189 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-07-07 09:25:50 +08:00
Cyrus Leung
9389380015
[Doc] Move guide for multimodal model and other improvements ( #6168 )
2024-07-06 17:18:59 +08:00
Roger Wang
175c43eca4
[Doc] Reorganize Supported Models by Type ( #6167 )
2024-07-06 05:59:36 +00:00
Simon Mo
bc96d5c330
Move release wheel env var to Dockerfile instead ( #6163 )
2024-07-05 17:19:53 -07:00
Simon Mo
f0250620dd
Fix release wheel build env var ( #6162 )
2024-07-05 16:24:31 -07:00
Simon Mo
2de490d60f
Update wheel builds to strip debug ( #6161 )
2024-07-05 14:51:25 -07:00
Simon Mo
79d406e918
[Docs] Fix readthedocs for tag build ( #6158 )
2024-07-05 12:44:40 -07:00
Simon Mo
abad5746a7
bump version to v0.5.1 ( #6157 )
2024-07-05 12:04:51 -07:00
JGSweets
e58294ddf2
[Bugfix] Add verbose error if scipy is missing for blocksparse attention ( #5695 )
2024-07-05 10:41:01 -07:00
jvlunteren
f1e15da6fe
[Frontend] Continuous usage stats in OpenAI completion API ( #5742 )
2024-07-05 10:37:09 -07:00
Christian Rohmann
0097bb1829
[Bugfix] Use templated datasource in grafana.json to allow automatic imports ( #6136 )
...
Signed-off-by: Christian Rohmann <christian.rohmann@inovex.de>
2024-07-05 09:49:47 -07:00
Cyrus Leung
ea4b570483
[VLM] Cleanup validation and update docs ( #6149 )
2024-07-05 05:49:38 +00:00
Roger Wang
a41357e941
[VLM] Improve consistency between feature size calculation and dummy data for profiling ( #6146 )
2024-07-05 09:29:47 +08:00
Cyrus Leung
ae96ef8fbd
[VLM] Calculate maximum number of multi-modal tokens by model ( #6121 )
2024-07-04 16:37:23 -07:00
Lily Liu
69ec3ca14c
[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer ( #6051 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-07-04 16:35:51 -07:00
Yuan
81d7a50f24
[Hardware][Intel CPU] Adding intel openmp tunings in Docker file ( #6008 )
...
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
2024-07-04 15:22:12 -07:00
youkaichao
27902d42be
[misc][doc] try to add warning for latest html ( #5979 )
2024-07-04 09:57:09 -07:00
Gregory Shtrasberg
56b325e977
[ROCm][AMD][Model]Adding alibi slopes support in ROCm triton flash attention and naive flash attention ( #6043 )
...
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
2024-07-03 22:19:38 -07:00
Cyrus Leung
3dd507083f
[CI/Build] Cleanup VLM tests ( #6107 )
2024-07-03 18:58:18 -07:00
Murali Andoorveedu
0ed646b7aa
[Distributed][Core] Support Py39 and Py38 for PP ( #6120 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-03 17:52:29 -07:00
Travis Johnson
1dab9bc8a9
[Bugfix] set OMP_NUM_THREADS to 1 by default for multiprocessing ( #6109 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-07-03 16:56:59 -07:00
youkaichao
3de6e6a30e
[core][distributed] support n layers % pp size != 0 ( #6115 )
2024-07-03 16:40:31 -07:00
youkaichao
966fe72141
[doc][misc] bump up py version in installation doc ( #6119 )
2024-07-03 15:52:04 -07:00
Robert Shaw
62963d129e
[ Misc ] Clean Up CompressedTensorsW8A8 ( #6113 )
2024-07-03 22:50:08 +00:00
xwjiang2010
d9e98f42e4
[vlm] Remove vision language config. ( #6089 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-03 22:14:16 +00:00
youkaichao
3c6325f0fc
[core][distributed] custom allreduce when pp size > 1 ( #6117 )
2024-07-03 14:41:32 -07:00
Michael Goin
47f0954af0
[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin ( #5975 )
2024-07-03 17:38:00 +00:00
Roger Wang
7cd2ebb025
[Bugfix] Fix compute_logits in Jamba ( #6093 )
2024-07-03 00:32:35 -07:00
Roger Wang
f1c78138aa
[Doc] Fix Mock Import ( #6094 )
2024-07-03 00:13:56 -07:00
Roger Wang
3a86b54fb0
[VLM][Frontend] Proper Image Prompt Formatting from OpenAI API ( #6091 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-02 23:41:23 -07:00
youkaichao
f666207161
[misc][distributed] error on invalid state ( #6092 )
2024-07-02 23:37:29 -07:00
Nick Hill
d830656a97
[BugFix] Avoid unnecessary Ray import warnings ( #6079 )
2024-07-03 14:09:40 +08:00
SangBin Cho
d18bab3587
[CI] Fix base url doesn't strip "/" ( #6087 )
2024-07-02 21:31:25 -07:00
Cyrus Leung
9831aec49f
[Core] Dynamic image size support for VLMs ( #5276 )
...
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-07-02 20:34:00 -07:00