Commit Graph

266 Commits

Author SHA1 Message Date
Li, Jiang
ca77dd7a44
[Hardware][CPU] Support AWQ for CPU backend (#7515) 2024-10-09 10:28:08 -06:00
youkaichao
c8627cd41b
[ci][test] use load dummy for testing (#9165) 2024-10-09 00:38:40 -07:00
Michael Goin
9ba0bd6aa6
Add lm-eval directly to requirements-test.txt (#9161) 2024-10-08 18:22:31 -07:00
Isotr0py
4f95ffee6f
[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089) 2024-10-07 06:50:35 +00:00
Kuntai Du
fbb74420e7
[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412) 2024-10-04 14:01:44 -07:00
Murali Andoorveedu
0f6d7a9a34
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:56:58 +08:00
Lily Liu
1570203864
[Spec Decode] (1/2) Remove batch expansion (#8839) 2024-10-01 16:04:42 -07:00
Lily Liu
bce324487a
[CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. (#8975) 2024-10-01 00:51:40 +00:00
Kevin H. Luu
1425a1bcf9
[ci] Add CODEOWNERS for test directories (#8795)
Signed-off-by: kevin <kevin@anyscale.com>
2024-10-01 00:47:08 +00:00
Cyrus Leung
e1a3f5e831
[CI/Build] Update models tests & examples (#8874)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-28 09:54:35 -07:00
Tyler Titsworth
260024a374
[Bugfix][Intel] Fix XPU Dockerfile Build (#7824)
Signed-off-by: tylertitsworth <tyler.titsworth@intel.com>
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-27 23:45:50 -07:00
youkaichao
d86f6b2afb
[misc] fix wheel name (#8919) 2024-09-27 22:10:44 -07:00
Luka Govedič
172d1cd276
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271) 2024-09-27 14:25:10 -04:00
Nick Hill
4b377d6feb
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829) 2024-09-26 16:46:43 -07:00
Kevin H. Luu
d9cfbc891e
[ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (#8872)
Signed-off-by: kevin <kevin@anyscale.com>
2024-09-26 15:02:16 -07:00
bnellnm
300da09177
[Kernel] Fullgraph and opcheck tests (#8479) 2024-09-25 08:35:52 -06:00
Hongxia Yang
1c046447a6
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777) 2024-09-25 22:26:37 +08:00
Alexey Kondratiev(AMD)
2940afa04e
[CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (#8670) 2024-09-20 10:27:44 -07:00
Alexey Kondratiev(AMD)
6cb748e190
[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (#8551) 2024-09-19 13:06:32 -07:00
Alexander Matveev
7c7714d856
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
Alexey Kondratiev(AMD)
09deb4721f
[CI/Build] Excluding kernels/test_gguf.py from ROCm (#8520) 2024-09-17 16:40:29 -07:00
sroy745
1009e93c5d
[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631) 2024-09-17 07:35:01 -07:00
youkaichao
99aa4eddaf
[torch.compile] register allreduce operations as custom ops (#8526) 2024-09-16 22:57:57 -07:00
Simon Mo
5478c4b41f
[perf bench] set timeout to debug hanging (#8516) 2024-09-16 14:30:02 -07:00
ywfang
8a0cf1ddc3
[Model] support minicpm3 (#8297)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
Cyrus Leung
a84e598e21
[CI/Build] Reorganize models tests (#7820) 2024-09-13 10:20:06 -07:00
Nick Hill
551ce01078
[Core] Add engine option to return only deltas or final output (#7381) 2024-09-12 12:02:00 -07:00
Joe Runde
f2e263b801
[Bugfix] Offline mode fix (#8376)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-09-12 11:11:57 -07:00
Lily Liu
775f00f81e
[Speculative Decoding] Test refactor (#8317)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-11 14:07:34 -07:00
Alexey Kondratiev(AMD)
aea02f30de
[CI/Build] Excluding test_moe.py from AMD Kernels tests for investigation (#8373) 2024-09-11 18:31:41 +00:00
Li, Jiang
0b952af458
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257) 2024-09-11 09:46:46 -07:00
sumitd2
02751a7a42
Fix ppc64le buildkite job (#8309) 2024-09-10 12:58:34 -07:00
Alexey Kondratiev(AMD)
f421f3cefb
[CI/Build] Enabling kernels tests for AMD, ignoring some of then that fail (#8130) 2024-09-10 11:51:15 -07:00
Dipika Sikka
6cd5e5b07e
[Misc] Fused MoE Marlin support for GPTQ (#8217) 2024-09-09 23:02:52 -04:00
sumitd2
b962ee1470
ppc64le: Dockerfile fixed, and a script for buildkite (#8026) 2024-09-07 11:18:40 -07:00
Cyrus Leung
288a938872
[Doc] Indicate more information about supported modalities (#8181) 2024-09-05 10:51:53 +00:00
Kevin H. Luu
ba262c4e5a
[ci] Mark LoRA test as soft-fail (#8160)
Signed-off-by: kevin <kevin@anyscale.com>
2024-09-04 20:33:12 -07:00
Kyle Mistele
e02ce498be
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
2024-09-04 13:18:13 -07:00
alexeykondrat
d1dec64243
[CI/Build][ROCm] Enabling LoRA tests on ROCm (#7369)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-04 11:57:54 -07:00
Cody Yu
2ad2e5608e
[MISC] Consolidate FP8 kv-cache tests (#8131) 2024-09-04 18:53:25 +00:00
TimWang
ccd7207191
chore: Update check-wheel-size.py to read MAX_SIZE_MB from env (#8103) 2024-09-03 23:17:05 -07:00
Roger Wang
5231f0898e
[Frontend][VLM] Add support for multiple multi-modal items (#8049) 2024-08-31 16:35:53 -07:00
Michael Goin
af59df0a10
Remove faulty Meta-Llama-3-8B-Instruct-FP8.yaml lm-eval test (#7961) 2024-08-28 19:19:17 -04:00
youkaichao
ce6bf3a2cf
[torch.compile] avoid Dynamo guard evaluation overhead (#7898)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-08-28 16:10:12 -07:00
alexeykondrat
42e932c7d4
[CI/Build][ROCm] Enabling tensorizer tests for ROCm (#7237) 2024-08-27 10:09:13 -07:00
youkaichao
64cc644425
[core][torch.compile] discard the compile for profiling (#7796) 2024-08-26 21:33:58 -07:00
youkaichao
7d9ffa2ae1
[misc][core] lazy import outlines (#7831) 2024-08-24 00:51:38 -07:00
Alexander Matveev
9db93de20c
[Core] Add multi-step support to LLMEngine (#7789) 2024-08-23 12:45:53 -07:00
SangBin Cho
c01a6cb231
[Ray backend] Better error when pg topology is bad. (#7584)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-08-22 17:44:25 -07:00
youkaichao
8c6f694a79
[ci] refine dependency for distributed tests (#7776) 2024-08-22 00:54:15 -07:00