Kunshang Ji
|
851725202a
|
[Hardware][intel GPU] bump up ipex version to 2.3 (#8365)
Co-authored-by: Yan Ma <yan.ma@intel.com>
|
2024-09-13 16:54:34 -07:00 |
|
Simon Mo
|
9ba0817ff1
|
bump version to v0.6.1.post2 (#8473)
|
2024-09-13 11:35:00 -07:00 |
|
Nick Hill
|
18e9e1f7b3
|
[HotFix] Fix final output truncation with stop string + streaming (#8468)
|
2024-09-13 11:31:12 -07:00 |
|
Isotr0py
|
f57092c00b
|
[Doc] Add oneDNN installation to CPU backend documentation (#8467)
|
2024-09-13 18:06:30 +00:00 |
|
Cyrus Leung
|
a84e598e21
|
[CI/Build] Reorganize models tests (#7820)
|
2024-09-13 10:20:06 -07:00 |
|
youkaichao
|
0a4806f0a9
|
[plugin][torch.compile] allow to add custom compile backend (#8445)
|
2024-09-13 09:32:42 -07:00 |
|
Cyrus Leung
|
ecd7a1d5b6
|
[Installation] Gate FastAPI version for Python 3.8 (#8456)
|
2024-09-13 09:02:26 -07:00 |
|
youkaichao
|
a2469127db
|
[misc][ci] fix quant test (#8449)
|
2024-09-13 17:20:14 +08:00 |
|
Jee Jee Li
|
06311e2956
|
[Misc] Skip loading extra bias for Qwen2-VL GPTQ-Int8 (#8442)
|
2024-09-13 07:58:28 +00:00 |
|
youkaichao
|
cab69a15e4
|
[doc] recommend pip instead of conda (#8446)
|
2024-09-12 23:52:41 -07:00 |
|
Isotr0py
|
9b4a3b235e
|
[CI/Build] Enable InternVL2 PP test only on single node (#8437)
|
2024-09-13 06:35:20 +00:00 |
|
Simon Mo
|
acda0b35d0
|
bump version to v0.6.1.post1 (#8440)
|
2024-09-12 21:39:49 -07:00 |
|
William Lin
|
ba77527955
|
[bugfix] torch profiler bug for single gpu with GPUExecutor (#8354)
|
2024-09-12 21:30:00 -07:00 |
|
Alexander Matveev
|
6821020109
|
[Bugfix] Fix async log stats (#8417)
|
2024-09-12 20:48:59 -07:00 |
|
Cyrus Leung
|
8427550488
|
[CI/Build] Update pixtral tests to use JSON (#8436)
|
2024-09-13 03:47:52 +00:00 |
|
Cyrus Leung
|
3f79bc3d1a
|
[Bugfix] Bump fastapi and pydantic version (#8435)
|
2024-09-13 03:21:42 +00:00 |
|
shangmingc
|
40c396533d
|
[Bugfix] Mapping physical device indices for e2e test utils (#8290)
|
2024-09-13 11:06:28 +08:00 |
|
Cyrus Leung
|
5ec9c0fb3c
|
[Core] Factor out input preprocessing to a separate class (#7329)
|
2024-09-13 02:56:13 +00:00 |
|
Dipika Sikka
|
8f44a92d85
|
[BugFix] fix group_topk (#8430)
|
2024-09-13 09:23:42 +08:00 |
|
Roger Wang
|
360ddbd37e
|
[Misc] Update Pixtral example (#8431)
|
2024-09-12 17:31:18 -07:00 |
|
Wenxiang
|
a480939e8e
|
[Bugfix] Fix weight loading issue by rename variable. (#8293)
|
2024-09-12 19:25:00 -04:00 |
|
Patrick von Platen
|
d31174a4e1
|
[Hotfix][Pixtral] Fix multiple images bugs (#8415)
|
2024-09-12 15:21:51 -07:00 |
|
Roger Wang
|
b61bd98f90
|
[CI/Build] Disable multi-node test for InternVL2 (#8428)
|
2024-09-12 15:05:35 -07:00 |
|
Roger Wang
|
c16369455f
|
[Hotfix][Core][VLM] Disable chunked prefill by default and prefix caching for multimodal models (#8425)
|
2024-09-12 14:06:51 -07:00 |
|
Alexander Matveev
|
019877253b
|
[Bugfix] multi-step + flashinfer: ensure cuda graph compatible (#8427)
|
2024-09-12 21:01:50 +00:00 |
|
Nick Hill
|
551ce01078
|
[Core] Add engine option to return only deltas or final output (#7381)
|
2024-09-12 12:02:00 -07:00 |
|
William Lin
|
a6c0f3658d
|
[multi-step] add flashinfer backend (#7928)
|
2024-09-12 11:16:22 -07:00 |
|
Joe Runde
|
f2e263b801
|
[Bugfix] Offline mode fix (#8376)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-09-12 11:11:57 -07:00 |
|
Luis Vega
|
1f0c75afa9
|
[BugFix] Fix Duplicate Assignment in Hermes2ProToolParser (#8423)
|
2024-09-12 11:10:11 -07:00 |
|
WANGWEI
|
8a23e93302
|
[BugFix] lazy init _copy_stream to avoid torch init wrong gpu instance (#8403)
|
2024-09-12 10:47:42 -07:00 |
|
Alex Brooks
|
c6202daeed
|
[Model] Support multiple images for qwen-vl (#8247)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-12 10:10:54 -07:00 |
|
Isotr0py
|
e56bf27741
|
[Bugfix] Fix InternVL2 inference with various num_patches (#8375)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-12 10:10:35 -07:00 |
|
Roger Wang
|
520ca380ae
|
[Hotfix][VLM] Fixing max position embeddings for Pixtral (#8399)
|
2024-09-12 09:28:37 -07:00 |
|
youkaichao
|
7de49aa86c
|
[torch.compile] hide slicing under custom op for inductor (#8384)
|
2024-09-12 00:11:55 -07:00 |
|
Woosuk Kwon
|
42ffba11ad
|
[Misc] Use RoPE cache for MRoPE (#8396)
|
2024-09-11 23:13:14 -07:00 |
|
Kevin Lin
|
295c4730a8
|
[Misc] Raise error when using encoder/decoder model with cpu backend (#8355)
|
2024-09-12 05:45:24 +00:00 |
|
Blueyo0
|
1bf2dd9df0
|
[Gemma2] add bitsandbytes support for Gemma2 (#8338)
|
2024-09-11 21:53:12 -07:00 |
|
tomeras91
|
5a60699c45
|
[Bugfix]: Fix the logic for deciding if tool parsing is used (#8366)
|
2024-09-12 03:55:30 +00:00 |
|
Michael Goin
|
b6c75e1cf2
|
Fix the AMD weight loading tests (#8390)
|
2024-09-11 20:35:33 -07:00 |
|
Woosuk Kwon
|
b71c956deb
|
[TPU] Use Ray for default distributed backend (#8389)
|
2024-09-11 20:31:51 -07:00 |
|
youkaichao
|
f842a7aff1
|
[misc] remove engine_use_ray (#8126)
|
2024-09-11 18:23:36 -07:00 |
|
Cody Yu
|
a65cb16067
|
[MISC] Dump model runner inputs when crashing (#8305)
|
2024-09-12 01:12:25 +00:00 |
|
Simon Mo
|
3fd2b0d21c
|
Bump version to v0.6.1 (#8379)
|
2024-09-11 14:42:11 -07:00 |
|
Patrick von Platen
|
d394787e52
|
Pixtral (#8377)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-11 14:41:55 -07:00 |
|
Lily Liu
|
775f00f81e
|
[Speculative Decoding] Test refactor (#8317)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-11 14:07:34 -07:00 |
|
Aarni Koskela
|
8baa454937
|
[Misc] Move device options to a single place (#8322)
|
2024-09-11 13:25:58 -07:00 |
|
bnellnm
|
73202dbe77
|
[Kernel][Misc] register ops to prevent graph breaks (#6917)
Co-authored-by: Sage Moore <sage@neuralmagic.com>
|
2024-09-11 12:52:19 -07:00 |
|
Cyrus Leung
|
7015417fd4
|
[Bugfix] Add missing attributes in mistral tokenizer (#8364)
|
2024-09-11 11:36:54 -07:00 |
|
Alexey Kondratiev(AMD)
|
aea02f30de
|
[CI/Build] Excluding test_moe.py from AMD Kernels tests for investigation (#8373)
|
2024-09-11 18:31:41 +00:00 |
|
Li, Jiang
|
0b952af458
|
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257)
|
2024-09-11 09:46:46 -07:00 |
|