youkaichao
d86f6b2afb
[misc] fix wheel name ( #8919 )
2024-09-27 22:10:44 -07:00
Luka Govedič
172d1cd276
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method ( #7271 )
2024-09-27 14:25:10 -04:00
Nick Hill
4b377d6feb
[BugFix] Fix test breakages from transformers 4.45 upgrade ( #8829 )
2024-09-26 16:46:43 -07:00
Kevin H. Luu
d9cfbc891e
[ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM ( #8872 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-09-26 15:02:16 -07:00
bnellnm
300da09177
[Kernel] Fullgraph and opcheck tests ( #8479 )
2024-09-25 08:35:52 -06:00
Hongxia Yang
1c046447a6
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade ( #8777 )
2024-09-25 22:26:37 +08:00
Alexey Kondratiev(AMD)
2940afa04e
[CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build ( #8670 )
2024-09-20 10:27:44 -07:00
Alexey Kondratiev(AMD)
6cb748e190
[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail ( #8551 )
2024-09-19 13:06:32 -07:00
Alexander Matveev
7c7714d856
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH ( #8157 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
Alexey Kondratiev(AMD)
09deb4721f
[CI/Build] Excluding kernels/test_gguf.py from ROCm ( #8520 )
2024-09-17 16:40:29 -07:00
sroy745
1009e93c5d
[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models ( #7631 )
2024-09-17 07:35:01 -07:00
youkaichao
99aa4eddaf
[torch.compile] register allreduce operations as custom ops ( #8526 )
2024-09-16 22:57:57 -07:00
Simon Mo
5478c4b41f
[perf bench] set timeout to debug hanging ( #8516 )
2024-09-16 14:30:02 -07:00
ywfang
8a0cf1ddc3
[Model] support minicpm3 ( #8297 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
Cyrus Leung
a84e598e21
[CI/Build] Reorganize models tests ( #7820 )
2024-09-13 10:20:06 -07:00
Nick Hill
551ce01078
[Core] Add engine option to return only deltas or final output ( #7381 )
2024-09-12 12:02:00 -07:00
Joe Runde
f2e263b801
[Bugfix] Offline mode fix ( #8376 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-09-12 11:11:57 -07:00
Lily Liu
775f00f81e
[Speculative Decoding] Test refactor ( #8317 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-11 14:07:34 -07:00
Alexey Kondratiev(AMD)
aea02f30de
[CI/Build] Excluding test_moe.py from AMD Kernels tests for investigation ( #8373 )
2024-09-11 18:31:41 +00:00
Li, Jiang
0b952af458
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend ( #7257 )
2024-09-11 09:46:46 -07:00
sumitd2
02751a7a42
Fix ppc64le buildkite job ( #8309 )
2024-09-10 12:58:34 -07:00
Alexey Kondratiev(AMD)
f421f3cefb
[CI/Build] Enabling kernels tests for AMD, ignoring some of then that fail ( #8130 )
2024-09-10 11:51:15 -07:00
Dipika Sikka
6cd5e5b07e
[Misc] Fused MoE Marlin support for GPTQ ( #8217 )
2024-09-09 23:02:52 -04:00
sumitd2
b962ee1470
ppc64le: Dockerfile fixed, and a script for buildkite ( #8026 )
2024-09-07 11:18:40 -07:00
Cyrus Leung
288a938872
[Doc] Indicate more information about supported modalities ( #8181 )
2024-09-05 10:51:53 +00:00
Kevin H. Luu
ba262c4e5a
[ci] Mark LoRA test as soft-fail ( #8160 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-09-04 20:33:12 -07:00
Kyle Mistele
e02ce498be
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models ( #5649 )
...
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
2024-09-04 13:18:13 -07:00
alexeykondrat
d1dec64243
[CI/Build][ROCm] Enabling LoRA tests on ROCm ( #7369 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-04 11:57:54 -07:00
Cody Yu
2ad2e5608e
[MISC] Consolidate FP8 kv-cache tests ( #8131 )
2024-09-04 18:53:25 +00:00
TimWang
ccd7207191
chore: Update check-wheel-size.py to read MAX_SIZE_MB from env ( #8103 )
2024-09-03 23:17:05 -07:00
Roger Wang
5231f0898e
[Frontend][VLM] Add support for multiple multi-modal items ( #8049 )
2024-08-31 16:35:53 -07:00
Michael Goin
af59df0a10
Remove faulty Meta-Llama-3-8B-Instruct-FP8.yaml lm-eval test ( #7961 )
2024-08-28 19:19:17 -04:00
youkaichao
ce6bf3a2cf
[torch.compile] avoid Dynamo guard evaluation overhead ( #7898 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-08-28 16:10:12 -07:00
alexeykondrat
42e932c7d4
[CI/Build][ROCm] Enabling tensorizer tests for ROCm ( #7237 )
2024-08-27 10:09:13 -07:00
youkaichao
64cc644425
[core][torch.compile] discard the compile for profiling ( #7796 )
2024-08-26 21:33:58 -07:00
youkaichao
7d9ffa2ae1
[misc][core] lazy import outlines ( #7831 )
2024-08-24 00:51:38 -07:00
Alexander Matveev
9db93de20c
[Core] Add multi-step support to LLMEngine ( #7789 )
2024-08-23 12:45:53 -07:00
SangBin Cho
c01a6cb231
[Ray backend] Better error when pg topology is bad. ( #7584 )
...
Co-authored-by: youkaichao <youkaichao@126.com>
2024-08-22 17:44:25 -07:00
youkaichao
8c6f694a79
[ci] refine dependency for distributed tests ( #7776 )
2024-08-22 00:54:15 -07:00
Luka Govedič
7937009a7e
[Kernel] Replaced blockReduce[...] functions with cub::BlockReduce ( #7233 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-21 20:18:00 -04:00
William Lin
5844017285
[ci] [multi-step] narrow multi-step test dependency paths ( #7760 )
2024-08-21 15:52:40 -07:00
Robert Shaw
f7e3b0c5aa
[Bugfix][Frontend] Fix Issues Under High Load With zeromq Frontend ( #7394 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-08-21 13:34:14 -04:00
Ronen Schaffer
2aa00d59ad
[CI/Build] Pin OpenTelemetry versions and make errors clearer ( #7266 )
...
[CI/Build] Pin OpenTelemetry versions and make a availability errors clearer (#7266 )
2024-08-20 10:02:21 -07:00
Kuntai Du
3d8a5f063d
[CI] Organizing performance benchmark files ( #7616 )
2024-08-19 22:43:54 -07:00
William Lin
47b65a5508
[core] Multi Step Scheduling ( #7000 )
...
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
2024-08-19 13:52:13 -07:00
Peng Guanwen
f710fb5265
[Core] Use flashinfer sampling kernel when available ( #7137 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-19 03:24:03 +00:00
Alex Brooks
40e1360bb6
[CI/Build] Add text-only test for Qwen models ( #7475 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-08-19 07:43:46 +08:00
SangBin Cho
4706eb628e
[aDAG] Unflake aDAG + PP tests ( #7600 )
2024-08-16 20:49:30 -07:00
Alexei-V-Ivanov-AMD
6bd19551b0
.[Build/CI] Enabling passing AMD tests. ( #7610 )
2024-08-16 20:25:32 -07:00
Michael Goin
44f26a9466
[Model] Align nemotron config with final HF state and fix lm-eval-small ( #7611 )
2024-08-16 15:56:34 -07:00