Lily Liu
5c60c8c423
[SpecDecode] [Minor] Fix spec decode sampler tests ( #7183 )
2024-08-06 10:40:32 -07:00
Katarzyna Papis
00afc78590
[Bugfix] add gguf dependency ( #7198 )
...
Co-authored-by: katarzyna.papis <kpapis@kpapis-u20.sclab.intel.com>
2024-08-06 10:08:35 -07:00
Robert Shaw
541c1852d3
[ BugFix ] Fix ZMQ when VLLM_PORT is set ( #7205 )
2024-08-06 09:26:26 -07:00
Dipika Sikka
a3bbbfa1d8
[BugFix] Fix DeepSeek remote code ( #7178 )
2024-08-06 08:16:53 -07:00
Cyrus Leung
1f26efbb3a
[Model] Support SigLIP encoder and alternative decoders for LLaVA models ( #7153 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-08-06 16:55:31 +08:00
Jee Jee Li
9118217f58
[LoRA] Relax LoRA condition ( #7146 )
2024-08-06 01:57:25 +00:00
Simon Mo
e3c664bfcb
[Build] Add initial conditional testing spec ( #6841 )
2024-08-05 17:39:22 -07:00
Isotr0py
360bd67cf0
[Core] Support loading GGUF model ( #5191 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-05 17:54:23 -06:00
Cody Yu
ef527be06c
[MISC] Use non-blocking transfer in prepare_input ( #7172 )
2024-08-05 23:41:27 +00:00
Jacob Schein
89b8db6bb2
[Bugfix] Specify device when loading LoRA and embedding tensors ( #7129 )
...
Co-authored-by: Jacob Schein <jacobschein@Jacobs-MacBook-Pro-2.local>
2024-08-05 16:35:47 -07:00
Thomas Parnell
789937af2e
[Doc] [SpecDecode] Update MLPSpeculator documentation ( #7100 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-08-05 23:29:43 +00:00
youkaichao
dfb1a15dcb
[ci][frontend] deduplicate tests ( #7101 )
2024-08-05 15:59:22 -07:00
Simon Mo
4db5176d97
bump version to v0.5.4 ( #7139 )
2024-08-05 14:39:48 -07:00
Tyler Michael Smith
4cf1dc39be
[Bugfix][CI/Build] Fix CUTLASS FetchContent ( #7171 )
2024-08-05 14:22:57 -07:00
Tyler Michael Smith
6e4852ce28
[CI/Build] Suppress divide-by-zero and missing return statement warnings ( #7001 )
2024-08-05 16:00:01 -04:00
Tyler Michael Smith
8571ac4672
[Kernel] Update CUTLASS to 3.5.1 ( #7085 )
2024-08-05 15:13:43 -04:00
Rui Qiao
997cf78308
[Misc] Fix typo in GroupCoordinator.recv() ( #7167 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-05 11:10:16 -07:00
Aditya Paliwal
57f560aa23
[BugFix] Use args.trust_remote_code ( #7121 )
2024-08-05 09:26:14 -07:00
Nick Hill
003f8ee128
[BugFix] Use IP4 localhost form for zmq bind ( #7163 )
2024-08-05 08:41:03 -07:00
Bongwon Jang
e9630458c7
[SpecDecode] Support FlashInfer in DraftModelRunner ( #6926 )
2024-08-05 08:05:05 -07:00
Cade Daniel
82a1b1a82b
[Speculative decoding] Add periodic log with time spent in proposal/scoring/verification ( #6963 )
2024-08-05 08:46:44 +00:00
Jungho Christopher Cho
c0d8f1636c
[Model] SiglipVisionModel ported from transformers ( #6942 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-08-05 06:22:12 +00:00
Cyrus Leung
cc08fc7225
[Frontend] Reapply "Factor out code for running uvicorn" ( #7095 )
2024-08-04 20:40:51 -07:00
Alphi
7b86e7c9cd
[Model] Add multi-image support for minicpmv ( #7122 )
...
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-05 09:23:17 +08:00
Jee Jee Li
f80ab3521c
Clean up remaining Punica C information ( #7027 )
2024-08-04 15:37:08 -07:00
youkaichao
16a1cc9bb2
[misc][distributed] improve libcudart.so finding ( #7127 )
2024-08-04 11:31:51 -07:00
Thomas Parnell
b1c9aa3daa
[Bugfix] [SpecDecode] Default speculative_draft_tensor_parallel_size to 1 when using MLPSpeculator ( #7105 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-08-04 07:13:18 -07:00
Jee Jee Li
179a6a36f2
[Model]Refactor MiniCPMV ( #7020 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 08:12:41 +00:00
youkaichao
83c644fe7e
[core][misc] simply output processing with shortcut code path ( #7117 )
2024-08-04 00:22:19 -07:00
youkaichao
9fadc7b7a0
[misc] add zmq in collect env ( #7119 )
2024-08-03 22:03:46 -07:00
Yihuan Bu
654bc5ca49
Support for guided decoding for offline LLM ( #6878 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 03:12:09 +00:00
Jeff Fialho
825b044863
[Frontend] Warn if user max_model_len is greater than derived max_model_len ( #7080 )
...
Signed-off-by: Jefferson Fialho <jfialho@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-03 16:01:38 -07:00
youkaichao
44dcb52e39
[ci][test] finalize fork_new_process_for_each_test ( #7114 )
2024-08-03 10:44:53 -07:00
Kuntai Du
67d745cc68
[CI] Temporarily turn off H100 performance benchmark ( #7104 )
2024-08-02 23:52:44 -07:00
Jee Jee Li
99d7cabd7b
[LoRA] ReplicatedLinear support LoRA ( #7081 )
2024-08-02 22:40:19 -07:00
Zach Zheng
fb2c1c86c1
[Bugfix] Fix block table for seqs that have prefix cache hits ( #7018 )
2024-08-02 22:38:15 -07:00
Isotr0py
0c25435daa
[Model] Refactor and decouple weight loading logic for InternVL2 model ( #7067 )
2024-08-02 22:36:14 -07:00
youkaichao
a0d164567c
[ci][distributed] disable ray dag tests ( #7099 )
2024-08-02 22:32:04 -07:00
youkaichao
04e5583425
[ci][distributed] merge distributed test commands ( #7097 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-02 21:33:53 -07:00
Cyrus Leung
8c025fa703
[Frontend] Factor out chat message parsing ( #7055 )
2024-08-02 21:31:27 -07:00
youkaichao
69ea15e5cc
[ci][distributed] shorten wait time if server hangs ( #7098 )
2024-08-02 21:05:16 -07:00
Robert Shaw
ed812a73fa
[ Frontend ] Multiprocessing for OpenAI Server with zeromq ( #6883 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-08-02 18:27:28 -07:00
youkaichao
708989341e
[misc] add a flag to enable compile ( #7092 )
2024-08-02 16:18:45 -07:00
Rui Qiao
22e718ff1a
[Misc] Revive to use loopback address for driver IP ( #7091 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-02 15:50:00 -07:00
Rui Qiao
05308891e2
[Core] Pipeline parallel with Ray ADAG ( #6837 )
...
Support pipeline-parallelism with Ray accelerated DAG.
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-02 13:55:40 -07:00
Lucas Wilkinson
a8d604ca2a
[Misc] Disambiguate quantized types via a new ScalarType ( #6396 )
2024-08-02 13:51:58 -07:00
Michael Goin
b482b9a5b1
[CI/Build] Add support for Python 3.12 ( #7035 )
2024-08-02 13:51:22 -07:00
youkaichao
806949514a
[ci] set timeout for test_oot_registration.py ( #7082 )
2024-08-02 10:03:24 -07:00
Jie Fu (傅杰)
c16eaac500
[Hardware][Intel CPU] Update torch 2.4.0 for CPU backend ( #6931 )
2024-08-02 08:55:58 -07:00
Peng Guanwen
db35186391
[Core] Comment out unused code in sampler ( #7023 )
2024-08-02 00:58:26 -07:00