Commit Graph

3266 Commits

Author SHA1 Message Date
Cyrus Leung
e808156f30
[Misc] Collect model support info in a single process per model (#9233) 2024-10-11 11:08:11 +00:00
youkaichao
cbc2ef5529
[misc] hide best_of from engine (#9261)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
2024-10-10 21:30:44 -07:00
Andy Dai
94bf9ae4e9
[Misc] Fix sampling from sonnet for long context case (#9235) 2024-10-11 00:33:16 +00:00
omrishiv
f990bab2a4
[Doc][Neuron] add note to neuron documentation about resolving triton issue (#9257)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-10-10 23:36:32 +00:00
youkaichao
e00c094f15
[torch.compile] generic decorators (#9258) 2024-10-10 15:54:23 -07:00
Kevin H. Luu
a78c6ba7c8
[ci/build] Add placeholder command for custom models test (#9262) 2024-10-10 15:45:09 -07:00
dependabot[bot]
fb870fd491
Bump actions/setup-python from 3 to 5 (#9195)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-10 13:30:46 -07:00
dependabot[bot]
270953bafb
Bump actions/checkout from 3 to 4 (#9196)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-10 13:30:35 -07:00
dependabot[bot]
9cc811c4ff
Bump actions/github-script from 6 to 7 (#9197)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-10-10 13:30:24 -07:00
youkaichao
e4d652ea3e
[torch.compile] integration with compilation control (#9058) 2024-10-10 12:39:36 -07:00
Simon Mo
78c0b4166c
Suggest codeowners for the core componenets (#9210) 2024-10-10 12:29:24 -07:00
jordanyono
21efb603f5
[CI/Build] Make the Dockerfile.cpu file's PIP_EXTRA_INDEX_URL Configurable as a Build Argument (#9252) 2024-10-10 18:18:18 +00:00
Rafael Vasquez
055f3270d4
[Doc] Improve debugging documentation (#9204)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-10 10:48:51 -07:00
Lucas Wilkinson
18511aeda6
[Bugfix] Fix Machete unittests failing with NotImplementedError (#9218) 2024-10-10 17:39:56 +00:00
Ilya Lavrenov
83ea5c72b9
[OpenVINO] Use torch 2.4.0 and newer optimim version (#9121)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-10 11:18:58 -06:00
whyiug
04de9057ab
[Model] support input image embedding for minicpmv (#9237) 2024-10-10 15:00:47 +00:00
Isotr0py
07c11cf4d4
[Bugfix] Fix lm_head weights tying with lora for llama (#9227) 2024-10-10 21:11:56 +08:00
sroy745
f3a507f1d3
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149) 2024-10-10 14:17:17 +08:00
Lucas Wilkinson
a64e7b9407
[Bugfix] Machete garbage results for some models (large K dim) (#9212) 2024-10-10 14:16:17 +08:00
Michael Goin
ce00231a8b
[Bugfix] Fix Weight Loading Multiple GPU Test - Large Models (#9213) 2024-10-10 14:15:40 +08:00
youkaichao
de895f1697
[misc] improve model support check in another process (#9208) 2024-10-09 21:58:27 -07:00
Russell Bryant
cf25b93bdd
[Core] Fix invalid args to _process_request (#9201)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-10 12:10:09 +08:00
Michael Goin
d5fbb8706d
[CI/Build] Update Dockerfile install+deploy image to ubuntu 22.04 (#9130)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-09 12:51:47 -06:00
Russell Bryant
cdca8994bd
[CI/Build] mypy: check vllm/entrypoints (#9194)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-09 17:15:28 +00:00
Li, Jiang
ca77dd7a44
[Hardware][CPU] Support AWQ for CPU backend (#7515) 2024-10-09 10:28:08 -06:00
Ewout ter Hoeven
7dea289066
Add Dependabot configuration for GitHub Actions updates (#1217)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-09 08:16:26 -07:00
Cyrus Leung
cfaa6008e6
[Bugfix] Access get_vocab instead of vocab in tool parsers (#9188) 2024-10-09 08:59:57 -06:00
Ahmad Fahadh Ilyas
21906a6f50
[Bugfix] Fix lora loading for Compressed Tensors in #9120 (#9179) 2024-10-09 12:10:44 +00:00
Jiangtao Hu
dc4aea677a
[Doc] Fix VLM prompt placeholder sample bug (#9170) 2024-10-09 08:59:42 +00:00
youkaichao
c8627cd41b
[ci][test] use load dummy for testing (#9165) 2024-10-09 00:38:40 -07:00
Cyrus Leung
8bfaa4e31e
[Bugfix] fix composite weight loading and EAGLE weight loading (#9160) 2024-10-09 00:36:55 -07:00
AlpinDale
0b5b5d767e
[Frontend] Log the maximum supported concurrency (#8831) 2024-10-09 00:03:14 -07:00
Hui Liu
cdc72e3c80
[Model] Remap FP8 kv_scale in CommandR and DBRX (#9174) 2024-10-09 06:43:06 +00:00
Joe Rowell
7627172bf4
[Bugfix][Doc] Report neuron error in output (#9159) 2024-10-08 22:43:34 -07:00
Travis Johnson
480b7f40cf
[Misc] Improve validation errors around best_of and n (#9167)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-10-09 04:54:48 +00:00
Yuan Tang
acce7630c1
Update link to KServe deployment guide (#9173) 2024-10-09 03:58:49 +00:00
Yuan Tang
ffc4b27ea8
Add classifiers in setup.py (#9171) 2024-10-08 19:30:48 -07:00
chenqianfzh
2f4117c38e
support bitsandbytes quantization with more models (#9148) 2024-10-08 19:52:19 -06:00
Michael Goin
9ba0bd6aa6
Add lm-eval directly to requirements-test.txt (#9161) 2024-10-08 18:22:31 -07:00
Russell Bryant
2a131965a8
mypy: check additional directories (#9162)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-08 22:08:22 +00:00
bnellnm
bd37b9fbe2
[Bugfix] Try to handle older versions of pytorch (#9086) 2024-10-08 14:28:12 -07:00
Rafael Vasquez
de24046fcd
[Doc] Improve contributing and installation documentation (#9132)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-08 20:22:08 +00:00
Sayak Paul
1874c6a1b0
[Doc] Update vlm.rst to include an example on videos (#9155)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-08 18:12:29 +00:00
Daniele
9a94ca4a5d
[Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537) 2024-10-08 09:38:40 -07:00
Peter Pan
cfba685bd4
[CI/Build] Add examples folder into Docker image so that we can leverage the templates*.jinja when serving models (#8758)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2024-10-08 09:37:34 -07:00
Alex Brooks
069d3bd8d0
[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-08 14:31:26 +00:00
Alex Brooks
a3691b6b5e
[Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-08 14:12:56 +00:00
Brendan Wong
8c746226c9
[Frontend] API support for beam search for MQLLMEngine (#9117) 2024-10-08 05:51:43 +00:00
youkaichao
e1faa2a598
[misc] improve ux on readme (#9147) 2024-10-07 22:26:25 -07:00
Kunshang Ji
80b57f00d5
[Intel GPU] Fix xpu decode input (#9145) 2024-10-08 03:51:14 +00:00