Commit Graph

3240 Commits

Author SHA1 Message Date
Chauncey
ac6b8f19b9
[Frontend] Multi-Modality Support for Loading Local Image Files (#9915)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2024-11-04 15:34:57 +00:00
Mengqing Cao
ccb5376a9a
[Bugfix][OpenVINO] Fix circular reference #9939 (#9974)
Signed-off-by: MengqingCao <cmq0113@163.com>
2024-11-04 18:14:13 +08:00
Tran Quang Dai
ea4adeddc1
[Bugfix] Fix E2EL mean and median stats (#9984)
Signed-off-by: daitran2k1 <tranquangdai7a@gmail.com>
2024-11-04 09:37:58 +00:00
Yang Zheng
4dbcbbeb09
[Misc] Compute query_start_loc/seq_start_loc on CPU (#9447)
Co-authored-by: Yang Zheng(SW)(Alex) <you@example.com>
2024-11-04 08:54:37 +00:00
Gregory Shtrasberg
b67feb1274
[Bugfix]Using the correct type hints (#9885)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2024-11-04 06:19:51 +00:00
Jee Jee Li
c49f0407ba
[Bugfix] Fix MiniCPMV and Mllama BNB bug (#9917)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-04 03:36:41 +00:00
Robert Shaw
91c9ebbb1b
[V1] Fix Configs (#9971) 2024-11-04 00:24:40 +00:00
shanshan wang
54597724f4
[Model] Add support for H2OVL-Mississippi models (#9747)
Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-11-04 00:15:36 +00:00
Nick Hill
1f1b6d6eda
[V1] Support per-request seed (#9945)
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
2024-11-03 09:14:17 -08:00
youkaichao
3bb4befea7
[bugfix] fix tsts (#9959)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-02 15:54:05 -07:00
Yongzao
ae5279a163
[torch.compile] Adding torch compile to vision-language models (#9946) 2024-11-02 12:56:05 -07:00
Nikita Furin
1b73ab2a1f
[CI/Build] Quoting around > (#9956) 2024-11-02 12:50:28 -07:00
youkaichao
cea808f325
[3/N] model runner pass the whole config to model (#9958)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-02 12:08:49 -07:00
youkaichao
74b529ceee
[bugfix] fix chatglm dummy_data_for_glmv (#9955)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-02 08:03:33 -07:00
Robert Shaw
d6459b4516
[V1] Fix EngineArgs refactor on V1 (#9954) 2024-11-02 07:44:38 -07:00
youkaichao
e893795443
[2/N] executor pass the complete config to worker/modelrunner (#9938)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2024-11-02 07:35:05 -07:00
Michael Green
1d4cfe2be1
[Doc] Updated tpu-installation.rst with more details (#9926)
Signed-off-by: Michael Green <mikegre@google.com>
2024-11-02 10:06:45 -04:00
Nick Hill
eed92f12fc
[Docs] Update Granite 3.0 models in supported models table (#9930)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-11-02 09:02:18 +00:00
youkaichao
af7380d83b
[torch.compile] fix cpu broken code (#9947)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-01 23:35:47 -07:00
sroy745
a78dd3303e
[Encoder Decoder] Add flash_attn kernel support for encoder-decoder models (#9559) 2024-11-01 23:22:49 -07:00
Kevin H. Luu
d522034c85
[ci/build] Have dependabot ignore pinned dependencies (#9935)
Signed-off-by: kevin <kevin@anyscale.com>
2024-11-01 23:56:13 +00:00
Peter Salas
6c0b7f548d
[Core][VLM] Add precise multi-modal placeholder tracking (#8346)
Signed-off-by: Peter Salas <peter@fixie.ai>
2024-11-01 16:21:10 -07:00
dependabot[bot]
d151fde834
[ci/build] Bump the patch-update group with 10 updates (#9897)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
2024-11-01 23:04:42 +00:00
Gene Der Su
27cd36e6e2
[Bugfix] PicklingError on RayTaskError (#9934)
Signed-off-by: Gene Su <e870252314@gmail.com>
2024-11-01 22:08:23 +00:00
youkaichao
18bd7587b7
[1/N] pass the complete config from engine to executor (#9933)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-01 13:51:57 -07:00
Pavani Majety
598b6d7b07
[Bugfix/Core] Flashinfer k_scale and v_scale (#9861) 2024-11-01 12:15:05 -07:00
youkaichao
aff1fd8188
[torch.compile] use interpreter with stable api from pytorch (#9889)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-01 11:50:37 -07:00
André Jonasson
4581d2cc02
[Core] Refactor: Clean up unused argument in Scheduler._preempt (#9696)
Signed-off-by: André Jonasson <andre.jonasson@gmail.com>
2024-11-01 11:41:38 -07:00
Travis Johnson
1dd4cb2935
[Bugfix] Fix edge cases for MistralTokenizer (#9625)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2024-11-01 10:33:15 -07:00
Cyrus Leung
ba0d892074
[Frontend] Use a proper chat template for VLM2Vec (#9912) 2024-11-01 14:09:07 +00:00
Michael Goin
30a2e80742
[CI/Build] Add Model Tests for PixtralHF (#9813) 2024-11-01 07:55:29 -06:00
Cyrus Leung
06386a64dd
[Frontend] Chat-based Embeddings API (#9759) 2024-11-01 08:13:35 +00:00
Cyrus Leung
d3aa2a8b2f
[Doc] Update multi-input support (#9906) 2024-11-01 07:34:49 +00:00
Yongzao
2b5bf20988
[torch.compile] Adding torch compile annotations to some models (#9876)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-11-01 00:25:47 -07:00
Michael Goin
93a76dd21d
[Model] Support bitsandbytes for MiniCPMV (#9891)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-01 13:31:56 +08:00
youkaichao
566cd27797
[torch.compile] rework test plans (#9866)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-31 22:20:17 -07:00
Michael Goin
37a4947dcd
[Bugfix] Fix layer skip logic with bitsandbytes (#9887)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-01 13:12:44 +08:00
youkaichao
96e0c9cbbd
[torch.compile] directly register custom op (#9896)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-31 21:56:09 -07:00
Joe Runde
031a7995f3
[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-11-01 01:09:46 +00:00
Kevin H. Luu
b63c64d95b
[ci/build] Configure dependabot to update pip dependencies (#9811)
Signed-off-by: kevin <kevin@anyscale.com>
2024-10-31 15:55:38 -07:00
Mor Zusman
9fb12f7848
[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 (#9838)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
2024-10-31 20:06:25 +00:00
sasha0552
55650c83a0
[Bugfix] Fix illegal memory access error with chunked prefill, prefix caching, block manager v2 and xformers enabled together (#9532)
Signed-off-by: sasha0552 <admin@sasha0552.org>
2024-10-31 11:46:36 -07:00
Alexei-V-Ivanov-AMD
77f7ef2908
[CI/Build] Adding a forced docker system prune to clean up space (#9849) 2024-11-01 01:02:58 +08:00
Alex Brooks
16b8f7a86f
[CI/Build] Add Model Tests for Qwen2-VL (#9846)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-31 09:10:52 -07:00
Jee Jee Li
5608e611c2
[Doc] Update Qwen documentation (#9869) 2024-10-31 08:54:18 +00:00
Roger Wang
3ea2dc2ec4
[Misc] Remove deprecated arg for cuda graph capture (#9864)
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-10-31 07:22:07 +00:00
Michael Goin
d087bf863e
[Model] Support quantization of Qwen2VisionTransformer (#9817)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-10-30 22:41:20 -07:00
Kevin H. Luu
890ca36072
Revert "[Bugfix] Use host argument to bind to interface (#9798)" (#9852) 2024-10-31 01:44:51 +00:00
Guillaume Calmettes
abbfb6134d
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837) 2024-10-30 18:15:56 -07:00
youkaichao
64384bbcdf
[torch.compile] upgrade tests (#9858)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-30 16:34:22 -07:00