Jungho Christopher Cho
c0d8f1636c
[Model] SiglipVisionModel ported from transformers ( #6942 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-08-05 06:22:12 +00:00
Cyrus Leung
cc08fc7225
[Frontend] Reapply "Factor out code for running uvicorn" ( #7095 )
2024-08-04 20:40:51 -07:00
Alphi
7b86e7c9cd
[Model] Add multi-image support for minicpmv ( #7122 )
...
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-05 09:23:17 +08:00
Jee Jee Li
f80ab3521c
Clean up remaining Punica C information ( #7027 )
2024-08-04 15:37:08 -07:00
youkaichao
16a1cc9bb2
[misc][distributed] improve libcudart.so finding ( #7127 )
2024-08-04 11:31:51 -07:00
Thomas Parnell
b1c9aa3daa
[Bugfix] [SpecDecode] Default speculative_draft_tensor_parallel_size to 1 when using MLPSpeculator ( #7105 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-08-04 07:13:18 -07:00
Jee Jee Li
179a6a36f2
[Model]Refactor MiniCPMV ( #7020 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 08:12:41 +00:00
youkaichao
83c644fe7e
[core][misc] simply output processing with shortcut code path ( #7117 )
2024-08-04 00:22:19 -07:00
youkaichao
9fadc7b7a0
[misc] add zmq in collect env ( #7119 )
2024-08-03 22:03:46 -07:00
Yihuan Bu
654bc5ca49
Support for guided decoding for offline LLM ( #6878 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 03:12:09 +00:00
Jeff Fialho
825b044863
[Frontend] Warn if user max_model_len is greater than derived max_model_len ( #7080 )
...
Signed-off-by: Jefferson Fialho <jfialho@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-03 16:01:38 -07:00
youkaichao
44dcb52e39
[ci][test] finalize fork_new_process_for_each_test ( #7114 )
2024-08-03 10:44:53 -07:00
Kuntai Du
67d745cc68
[CI] Temporarily turn off H100 performance benchmark ( #7104 )
2024-08-02 23:52:44 -07:00
Jee Jee Li
99d7cabd7b
[LoRA] ReplicatedLinear support LoRA ( #7081 )
2024-08-02 22:40:19 -07:00
Zach Zheng
fb2c1c86c1
[Bugfix] Fix block table for seqs that have prefix cache hits ( #7018 )
2024-08-02 22:38:15 -07:00
Isotr0py
0c25435daa
[Model] Refactor and decouple weight loading logic for InternVL2 model ( #7067 )
2024-08-02 22:36:14 -07:00
youkaichao
a0d164567c
[ci][distributed] disable ray dag tests ( #7099 )
2024-08-02 22:32:04 -07:00
youkaichao
04e5583425
[ci][distributed] merge distributed test commands ( #7097 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-02 21:33:53 -07:00
Cyrus Leung
8c025fa703
[Frontend] Factor out chat message parsing ( #7055 )
2024-08-02 21:31:27 -07:00
youkaichao
69ea15e5cc
[ci][distributed] shorten wait time if server hangs ( #7098 )
2024-08-02 21:05:16 -07:00
Robert Shaw
ed812a73fa
[ Frontend ] Multiprocessing for OpenAI Server with zeromq ( #6883 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-08-02 18:27:28 -07:00
youkaichao
708989341e
[misc] add a flag to enable compile ( #7092 )
2024-08-02 16:18:45 -07:00
Rui Qiao
22e718ff1a
[Misc] Revive to use loopback address for driver IP ( #7091 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-02 15:50:00 -07:00
Rui Qiao
05308891e2
[Core] Pipeline parallel with Ray ADAG ( #6837 )
...
Support pipeline-parallelism with Ray accelerated DAG.
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2024-08-02 13:55:40 -07:00
Lucas Wilkinson
a8d604ca2a
[Misc] Disambiguate quantized types via a new ScalarType ( #6396 )
2024-08-02 13:51:58 -07:00
Michael Goin
b482b9a5b1
[CI/Build] Add support for Python 3.12 ( #7035 )
2024-08-02 13:51:22 -07:00
youkaichao
806949514a
[ci] set timeout for test_oot_registration.py ( #7082 )
2024-08-02 10:03:24 -07:00
Jie Fu (傅杰)
c16eaac500
[Hardware][Intel CPU] Update torch 2.4.0 for CPU backend ( #6931 )
2024-08-02 08:55:58 -07:00
Peng Guanwen
db35186391
[Core] Comment out unused code in sampler ( #7023 )
2024-08-02 00:58:26 -07:00
youkaichao
660dea1235
[cuda][misc] remove error_on_invalid_device_count_status ( #7069 )
2024-08-02 00:14:21 -07:00
Bongwon Jang
cf2a1a4d9d
Fix tracing.py ( #7065 )
2024-08-01 23:28:00 -07:00
youkaichao
252357793d
[ci][distributed] try to fix pp test ( #7054 )
2024-08-01 22:03:12 -07:00
Cyrus Leung
3bb4b1e4cd
[mypy] Speed up mypy checking ( #7056 )
2024-08-01 19:49:43 -07:00
Lily Liu
954f7305a1
[Kernel] Fix input for flashinfer prefill wrapper. ( #7008 )
2024-08-01 18:44:16 -07:00
Woosuk Kwon
6ce01f3066
[Performance] Optimize get_seqs ( #7051 )
2024-08-01 18:29:52 -07:00
Tyler Michael Smith
6a11fdfbb8
[CI/Build][Bugfix] Fix CUTLASS header-only line ( #7034 )
2024-08-01 13:51:15 -07:00
Woosuk Kwon
805a8a75f2
[Misc] Support attention logits soft-capping with flash-attn ( #7022 )
2024-08-01 13:14:37 -07:00
omkar kakarparthi
562e580abc
Update run-amd-test.sh ( #7044 )
2024-08-01 13:12:37 -07:00
Murali Andoorveedu
fc912e0886
[Models] Support Qwen model with PP ( #6974 )
...
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-08-01 12:40:43 -07:00
Michael Goin
f4fd390f5d
[Bugfix] Lower gemma's unloaded_params exception to warning ( #7002 )
2024-08-01 12:01:07 -07:00
Michael Goin
fb3db61688
[CI/Build] Remove sparseml requirement from testing ( #7037 )
2024-08-01 12:00:51 -07:00
Isotr0py
2dd34371a6
[Bugfix] Fix RMSNorm forward in InternViT attention qk_layernorm ( #6992 )
2024-08-01 12:00:28 -07:00
Sage Moore
7e0861bd0b
[CI/Build] Update PyTorch to 2.4.0 ( #6951 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-01 11:11:24 -07:00
Alexei-V-Ivanov-AMD
a72a424b3e
[Build/CI] Fixing Docker Hub quota issue. ( #7043 )
2024-08-01 11:07:37 -07:00
youkaichao
c8a7e93273
[core][scheduler] simplify and improve scheduler ( #6867 )
2024-07-31 23:51:09 -07:00
zifeitong
3c10591ef2
[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user ( #6954 )
2024-07-31 21:13:34 -07:00
Aurick Qiao
0437492ea9
PP comm optimization: replace send with partial send + allgather ( #6695 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
2024-07-31 20:15:42 -07:00
Travis Johnson
630dd9e0ae
[Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings ( #6758 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-07-31 19:49:11 -07:00
Woosuk Kwon
23993a7997
[Bugfix][TPU] Do not use torch.Generator for TPUs ( #6981 )
2024-07-31 18:50:28 -07:00
xuyi
1d2e7fb73f
[Model] Pipeline parallel support for Qwen2 ( #6924 )
2024-07-31 18:49:51 -07:00