Kyle Sayers
|
c7cb5c3335
|
[Misc] GPTQ Activation Ordering (#8135)
|
2024-09-09 16:27:26 -04:00 |
|
Vladislav Kruglikov
|
f9b4a2d415
|
[Bugfix] Correct adapter usage for cohere and jamba (#8292)
|
2024-09-09 11:20:46 -07:00 |
|
Adam Lugowski
|
58fcc8545a
|
[Frontend] Add progress reporting to run_batch.py (#8060)
Co-authored-by: Adam Lugowski <adam.lugowski@parasail.io>
|
2024-09-09 11:16:37 -07:00 |
|
Kyle Mistele
|
08287ef675
|
[Bugfix] Streamed tool calls now more strictly follow OpenAI's format; ensures Vercel AI SDK compatibility (#8272)
|
2024-09-09 10:45:11 -04:00 |
|
Alexander Matveev
|
4ef41b8476
|
[Bugfix] Fix async postprocessor in case of preemption (#8267)
|
2024-09-07 21:01:51 -07:00 |
|
Joe Runde
|
cfe712bf1a
|
[CI/Build] Use python 3.12 in cuda image (#8133)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-09-07 13:03:16 -07:00 |
|
sumitd2
|
b962ee1470
|
ppc64le: Dockerfile fixed, and a script for buildkite (#8026)
|
2024-09-07 11:18:40 -07:00 |
|
Isotr0py
|
36bf8150cc
|
[Model][VLM] Decouple weight loading logic for Paligemma (#8269)
|
2024-09-07 17:45:44 +00:00 |
|
Isotr0py
|
e807125936
|
[Model][VLM] Support multi-images inputs for InternVL2 models (#8201)
|
2024-09-07 16:38:23 +08:00 |
|
Cyrus Leung
|
9f68e00d27
|
[Bugfix] Fix broken OpenAI tensorizer test (#8258)
|
2024-09-07 08:02:39 +00:00 |
|
youkaichao
|
ce2702a923
|
[tpu][misc] fix typo (#8260)
|
2024-09-06 22:40:46 -07:00 |
|
Wei-Sheng Chin
|
795b662cff
|
Enable Random Prefix Caching in Serving Profiling Tool (benchmark_serving.py) (#8241)
|
2024-09-06 20:18:16 -07:00 |
|
Cyrus Leung
|
2f707fcb35
|
[Model] Multi-input support for LLaVA (#8238)
|
2024-09-07 02:57:24 +00:00 |
|
Kyle Mistele
|
41e95c5247
|
[Bugfix] Fix Hermes tool call chat template bug (#8256)
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-07 10:49:01 +08:00 |
|
William Lin
|
12dd715807
|
[misc] [doc] [frontend] LLM torch profiler support (#7943)
|
2024-09-06 17:48:48 -07:00 |
|
Patrick von Platen
|
29f49cd6e3
|
[Model] Allow loading from original Mistral format (#8168)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-09-06 17:02:05 -06:00 |
|
Dipika Sikka
|
23f322297f
|
[Misc] Remove SqueezeLLM (#8220)
|
2024-09-06 16:29:03 -06:00 |
|
rasmith
|
9db52eab3d
|
[Kernel] [Triton] Memory optimization for awq_gemm and awq_dequantize, 2x throughput (#8248)
|
2024-09-06 16:26:09 -06:00 |
|
Alexey Kondratiev(AMD)
|
1447c97e75
|
[CI/Build] Increasing timeout for multiproc worker tests (#8203)
|
2024-09-06 11:51:03 -07:00 |
|
Rui Qiao
|
de80783b69
|
[Misc] Use ray[adag] dependency instead of cuda (#7938)
|
2024-09-06 09:18:35 -07:00 |
|
afeldman-nm
|
e5cab71531
|
[Frontend] Add --logprobs argument to benchmark_serving.py (#8191)
|
2024-09-06 09:01:14 -07:00 |
|
Nick Hill
|
baa5467547
|
[BugFix] Fix Granite model configuration (#8216)
|
2024-09-06 11:39:29 +08:00 |
|
Jiaxin Shan
|
db3bf7c991
|
[Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-09-05 18:10:33 -07:00 |
|
sroy745
|
2febcf2777
|
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (#7962)
|
2024-09-05 16:25:29 -04:00 |
|
Michael Goin
|
2ee45281a5
|
Move verify_marlin_supported to GPTQMarlinLinearMethod (#8165)
|
2024-09-05 11:09:46 -04:00 |
|
Alex Brooks
|
9da25a88aa
|
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-05 12:48:10 +00:00 |
|
manikandan.tm@zucisystems.com
|
8685ba1a1e
|
Inclusion of InternVLChatModel In PP_SUPPORTED_MODELS(Pipeline Parallelism) (#7860)
|
2024-09-05 11:33:37 +00:00 |
|
Cyrus Leung
|
288a938872
|
[Doc] Indicate more information about supported modalities (#8181)
|
2024-09-05 10:51:53 +00:00 |
|
Elfie Guo
|
e39ebf5cf5
|
[Core/Bugfix] Add query dtype as per FlashInfer API requirements. (#8173)
|
2024-09-05 05:12:26 +00:00 |
|
Kevin H. Luu
|
ba262c4e5a
|
[ci] Mark LoRA test as soft-fail (#8160)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-09-04 20:33:12 -07:00 |
|
Woosuk Kwon
|
4624d98dbd
|
[Misc] Clean up RoPE forward_native (#8076)
|
2024-09-04 20:31:48 -07:00 |
|
William Lin
|
1afc931987
|
[bugfix] >1.43 constraint for openai (#8169)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-09-04 17:35:36 -07:00 |
|
Maureen McElaney
|
e01c2beb7d
|
[Doc] [Misc] Create CODE_OF_CONDUCT.md (#8161)
|
2024-09-04 16:50:13 -07:00 |
|
Simon Mo
|
32e7db2536
|
Bump version to v0.6.0 (#8166)
|
2024-09-04 16:34:27 -07:00 |
|
Harsha vardhan manoj Bikki
|
008cf886c9
|
[Neuron] Adding support for adding/ overriding neuron configuration a… (#8062)
Co-authored-by: Harsha Bikki <harbikh@amazon.com>
|
2024-09-04 16:33:43 -07:00 |
|
Cody Yu
|
77d9e514a2
|
[MISC] Replace input token throughput with total token throughput (#8164)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-09-04 20:23:22 +00:00 |
|
Kyle Mistele
|
e02ce498be
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-04 13:18:13 -07:00 |
|
Woosuk Kwon
|
561d6f8077
|
[CI] Change test input in Gemma LoRA test (#8163)
|
2024-09-04 13:05:50 -07:00 |
|
alexeykondrat
|
d1dec64243
|
[CI/Build][ROCm] Enabling LoRA tests on ROCm (#7369)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-09-04 11:57:54 -07:00 |
|
Cody Yu
|
2ad2e5608e
|
[MISC] Consolidate FP8 kv-cache tests (#8131)
|
2024-09-04 18:53:25 +00:00 |
|
wnma
|
d3311562fb
|
[Bugfix] remove post_layernorm in siglip (#8106)
|
2024-09-04 18:55:37 +08:00 |
|
TimWang
|
ccd7207191
|
chore: Update check-wheel-size.py to read MAX_SIZE_MB from env (#8103)
|
2024-09-03 23:17:05 -07:00 |
|
Cyrus Leung
|
855c262a6b
|
[Frontend] Multimodal support in offline chat (#8098)
|
2024-09-04 05:22:17 +00:00 |
|
Peter Salas
|
2be8ec6e71
|
[Model] Add Ultravox support for multiple audio chunks (#7963)
|
2024-09-04 04:38:21 +00:00 |
|
Dipika Sikka
|
e16fa99a6a
|
[Misc] Update fbgemmfp8 to use vLLMParameters (#7972)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-09-03 20:12:41 -06:00 |
|
Woosuk Kwon
|
61f4a93d14
|
[TPU][Bugfix] Use XLA rank for persistent cache path (#8137)
|
2024-09-03 18:35:33 -07:00 |
|
Nick Hill
|
d4db9f53c8
|
[Benchmark] Add --async-engine option to benchmark_throughput.py (#7964)
|
2024-09-03 20:57:41 -04:00 |
|
Dipika Sikka
|
2188a60c7e
|
[Misc] Update GPTQ to use vLLMParameters (#7976)
|
2024-09-03 17:21:44 -04:00 |
|
Simon Mo
|
dc0b6066ab
|
[CI] Change PR remainder to avoid at-mentions (#8134)
|
2024-09-03 14:11:42 -07:00 |
|
Woosuk Kwon
|
0af3abe3d3
|
[TPU][Bugfix] Fix next_token_ids shape (#8128)
|
2024-09-03 13:29:24 -07:00 |
|