Commit Graph

3217 Commits

Author SHA1 Message Date
Gene Der Su
27cd36e6e2
[Bugfix] PicklingError on RayTaskError (#9934)
Signed-off-by: Gene Su <e870252314@gmail.com>
2024-11-01 22:08:23 +00:00
youkaichao
18bd7587b7
[1/N] pass the complete config from engine to executor (#9933)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-01 13:51:57 -07:00
Pavani Majety
598b6d7b07
[Bugfix/Core] Flashinfer k_scale and v_scale (#9861) 2024-11-01 12:15:05 -07:00
youkaichao
aff1fd8188
[torch.compile] use interpreter with stable api from pytorch (#9889)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-01 11:50:37 -07:00
André Jonasson
4581d2cc02
[Core] Refactor: Clean up unused argument in Scheduler._preempt (#9696)
Signed-off-by: André Jonasson <andre.jonasson@gmail.com>
2024-11-01 11:41:38 -07:00
Travis Johnson
1dd4cb2935
[Bugfix] Fix edge cases for MistralTokenizer (#9625)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-11-01 10:33:15 -07:00
Cyrus Leung
ba0d892074
[Frontend] Use a proper chat template for VLM2Vec (#9912) 2024-11-01 14:09:07 +00:00
Michael Goin
30a2e80742
[CI/Build] Add Model Tests for PixtralHF (#9813) 2024-11-01 07:55:29 -06:00
Cyrus Leung
06386a64dd
[Frontend] Chat-based Embeddings API (#9759) 2024-11-01 08:13:35 +00:00
Cyrus Leung
d3aa2a8b2f
[Doc] Update multi-input support (#9906) 2024-11-01 07:34:49 +00:00
Yongzao
2b5bf20988
[torch.compile] Adding torch compile annotations to some models (#9876)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-01 00:25:47 -07:00
Michael Goin
93a76dd21d
[Model] Support bitsandbytes for MiniCPMV (#9891)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-01 13:31:56 +08:00
youkaichao
566cd27797
[torch.compile] rework test plans (#9866)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-31 22:20:17 -07:00
Michael Goin
37a4947dcd
[Bugfix] Fix layer skip logic with bitsandbytes (#9887)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-01 13:12:44 +08:00
youkaichao
96e0c9cbbd
[torch.compile] directly register custom op (#9896)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-31 21:56:09 -07:00
Joe Runde
031a7995f3
[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-11-01 01:09:46 +00:00
Kevin H. Luu
b63c64d95b
[ci/build] Configure dependabot to update pip dependencies (#9811)
Signed-off-by: kevin <kevin@anyscale.com>
2024-10-31 15:55:38 -07:00
Mor Zusman
9fb12f7848
[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 (#9838)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
2024-10-31 20:06:25 +00:00
sasha0552
55650c83a0
[Bugfix] Fix illegal memory access error with chunked prefill, prefix caching, block manager v2 and xformers enabled together (#9532)
Signed-off-by: sasha0552 <admin@sasha0552.org>
2024-10-31 11:46:36 -07:00
Alexei-V-Ivanov-AMD
77f7ef2908
[CI/Build] Adding a forced docker system prune to clean up space (#9849) 2024-11-01 01:02:58 +08:00
Alex Brooks
16b8f7a86f
[CI/Build] Add Model Tests for Qwen2-VL (#9846)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-31 09:10:52 -07:00
Jee Jee Li
5608e611c2
[Doc] Update Qwen documentation (#9869) 2024-10-31 08:54:18 +00:00
Roger Wang
3ea2dc2ec4
[Misc] Remove deprecated arg for cuda graph capture (#9864)
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-10-31 07:22:07 +00:00
Michael Goin
d087bf863e
[Model] Support quantization of Qwen2VisionTransformer (#9817)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-10-30 22:41:20 -07:00
Kevin H. Luu
890ca36072
Revert "[Bugfix] Use host argument to bind to interface (#9798)" (#9852) 2024-10-31 01:44:51 +00:00
Guillaume Calmettes
abbfb6134d
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837) 2024-10-30 18:15:56 -07:00
youkaichao
64384bbcdf
[torch.compile] upgrade tests (#9858)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-30 16:34:22 -07:00
Yongzao
00d91c8a2c
[CI/Build] Simplify exception trace in api server tests (#9787)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-30 14:52:05 -07:00
youkaichao
c2cd1a2142
[doc] update pp support (#9853)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-30 13:36:51 -07:00
Harsha vardhan manoj Bikki
c787f2d81d
[Neuron] Update Dockerfile.neuron to fix build failure (#9822) 2024-10-30 12:22:02 -07:00
Joe Runde
33d257735f
[Doc] link bug for multistep guided decoding (#9843)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-30 17:28:29 +00:00
Joe Runde
3b3f1e7436
[Bugfix][core] replace heartbeat with pid check (#9818)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-30 09:34:07 -07:00
Elfie Guo
9ff4511e43
[Misc] Add chunked-prefill support on FlashInfer. (#9781) 2024-10-30 09:33:53 -07:00
Went-Liang
81f09cfd80
[Model] Support math-shepherd-mistral-7b-prm model (#9697)
Signed-off-by: Went-Liang <wenteng_liang@163.com>
2024-10-30 09:33:42 -07:00
Alex Brooks
cc98f1e079
[CI/Build] VLM Test Consolidation (#9372)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-30 09:32:17 -07:00
Woosuk Kwon
211fe91aa8
[TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA (#9438) 2024-10-30 09:41:38 +00:00
Jee Jee Li
6aa6020f9b
[Misc] Specify minimum pynvml version (#9827)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-10-29 23:05:43 -07:00
youkaichao
ff5ed6e1bc
[torch.compile] rework compile control with piecewise cudagraph (#9715)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-29 23:03:49 -07:00
Russell Bryant
7b0365efef
[Doc] Add the DCO to CONTRIBUTING.md (#9803)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-30 05:22:23 +00:00
Yan Ma
04a3ae0aca
[Bugfix] Fix multi nodes TP+PP for XPU (#8884)
Signed-off-by: YiSheng5 <syhm@mail.ustc.edu.cn>
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: YiSheng5 <syhm@mail.ustc.edu.cn>
2024-10-29 21:34:45 -07:00
Kevin H. Luu
62fac4b9aa
[ci/build] Pin CI dependencies version with pip-compile (#9810)
Signed-off-by: kevin <kevin@anyscale.com>
2024-10-30 03:34:55 +00:00
Michael Goin
226688bd61
[Bugfix][VLM] Make apply_fp8_linear work with >2D input (#9812) 2024-10-29 19:49:44 -07:00
Lily Liu
64cb1cdc3f
Update README.md (#9819) 2024-10-29 17:28:43 -07:00
youkaichao
1ab6f6b4ad
[core][distributed] fix custom allreduce in pytorch 2.5 (#9815)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-29 17:06:24 -07:00
Michael Goin
bc73e9821c
[Bugfix] Fix prefix strings for quantized VLMs (#9772) 2024-10-29 16:02:59 -07:00
Simon Mo
8d7724104a
[Docs] Add notes about Snowflake Meetup (#9814)
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-10-29 15:19:02 -07:00
Will Eaton
882a1ad0de
[Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
2024-10-29 15:07:37 -07:00
Joe Runde
67bdf8e523
[Bugfix][Frontend] Guard against bad token ids (#9634)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-29 14:13:20 -07:00
Kunjan
0ad216f575
[MISC] Set label value to timestamp over 0, to keep track of recent history (#9777)
Signed-off-by: Kunjan Patel <kunjanp@google.com>
2024-10-29 19:52:19 +00:00
Russell Bryant
7585ec996f
[CI/Build] mergify: fix rules for ci/build label (#9804)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-29 19:24:42 +00:00