Kevin H. Luu
|
aaccca2b4d
|
[CI/Build] Fix machete generated kernel files ordering (#8976)
Signed-off-by: kevin <kevin@anyscale.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-10-01 03:33:12 +00:00 |
|
Joe Runde
|
062c89e7c9
|
[Frontend][Core] Move guided decoding params into sampling params (#8252)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-01 09:34:25 +08:00 |
|
Lily Liu
|
bce324487a
|
[CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. (#8975)
|
2024-10-01 00:51:40 +00:00 |
|
Kevin H. Luu
|
1425a1bcf9
|
[ci] Add CODEOWNERS for test directories (#8795)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-10-01 00:47:08 +00:00 |
|
Jee Jee Li
|
1cabfcefb6
|
[Misc] Adjust max_position_embeddings for LoRA compatibility (#8957)
|
2024-09-30 12:57:39 +00:00 |
|
Sebastian Schoennenbeck
|
be76e5aabf
|
[Core] Make scheduling policy settable via EngineArgs (#8956)
|
2024-09-30 12:28:44 +00:00 |
|
Isotr0py
|
2ae25f79cf
|
[Model] Expose InternVL2 max_dynamic_patch as a mm_processor_kwarg (#8946)
|
2024-09-30 13:01:20 +08:00 |
|
Jee Jee Li
|
8e60afa15e
|
[Model][LoRA]LoRA support added for MiniCPMV2.6 (#8943)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-30 04:31:55 +00:00 |
|
Roger Wang
|
b6d7392579
|
[Misc][CI/Build] Include cv2 via mistral_common[opencv] (#8951)
|
2024-09-30 04:28:26 +00:00 |
|
whyiug
|
e01ab595d8
|
[Model] support input embeddings for qwen2vl (#8856)
|
2024-09-30 03:16:10 +00:00 |
|
Mor Zusman
|
f13a07b1f8
|
[Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533)
|
2024-09-29 17:35:58 -04:00 |
|
danieljannai21
|
6c9ba48fde
|
[Frontend] Added support for HF's new continue_final_message parameter (#8942)
|
2024-09-29 17:59:47 +00:00 |
|
juncheoll
|
1fb9c1b0bf
|
[Misc] Fix typo in BlockSpaceManagerV1 (#8944)
|
2024-09-29 15:05:54 +00:00 |
|
Nick Hill
|
31f46a0d35
|
[BugFix] Fix seeded random sampling with encoder-decoder models (#8870)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-29 09:43:14 +00:00 |
|
Jee Jee Li
|
3d49776bbb
|
[Model][LoRA]LoRA support added for MiniCPMV2.5 (#7199)
|
2024-09-29 06:59:45 +00:00 |
|
Zilin Zhu
|
bc2ef1f77c
|
[Model] Support Qwen2.5-Math-RM-72B (#8896)
|
2024-09-28 21:19:39 -07:00 |
|
Tyler Michael Smith
|
2e7fe7e79f
|
[Build/CI] Set FETCHCONTENT_BASE_DIR to one location for better caching (#8930)
|
2024-09-29 03:13:01 +00:00 |
|
Cyrus Leung
|
26a68d5d7e
|
[CI/Build] Add test decorator for minimum GPU memory (#8925)
|
2024-09-29 02:50:51 +00:00 |
|
ElizaWszola
|
d081da0064
|
[Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-09-28 18:19:40 -07:00 |
|
sroy745
|
5bf8789b2a
|
[Bugfix] Block manager v2 with preemption and lookahead slots (#8824)
|
2024-09-29 09:17:45 +08:00 |
|
Russell Bryant
|
d1537039ce
|
[Core] Improve choice of Python multiprocessing method (#8823)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-29 09:17:07 +08:00 |
|
youkaichao
|
cc276443b5
|
[doc] organize installation doc and expose per-commit docker (#8931)
|
2024-09-28 17:48:41 -07:00 |
|
Chen Zhang
|
e585b583a9
|
[Bugfix] Support testing prefill throughput with benchmark_serving.py --hf-output-len 1 (#8891)
|
2024-09-28 18:51:22 +00:00 |
|
Edouard B.
|
090e945e36
|
[Frontend] Make beam search emulator temperature modifiable (#8928)
Co-authored-by: Eduard Balzin <nfunctor@yahoo.fr>
|
2024-09-28 11:30:21 -07:00 |
|
Cyrus Leung
|
e1a3f5e831
|
[CI/Build] Update models tests & examples (#8874)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-28 09:54:35 -07:00 |
|
Varun Sundar Rabindranath
|
19d02ff938
|
[Bugfix] Fix PP for Multi-Step (#8887)
|
2024-09-28 08:52:46 -07:00 |
|
tastelikefeet
|
39d3f8d94f
|
[Bugfix] Fix code for downloading models from modelscope (#8443)
|
2024-09-28 08:24:12 -07:00 |
|
Cyrus Leung
|
b0298aa8cc
|
[Misc] Remove vLLM patch of BaichuanTokenizer (#8921)
|
2024-09-28 08:11:25 +00:00 |
|
Tyler Titsworth
|
260024a374
|
[Bugfix][Intel] Fix XPU Dockerfile Build (#7824)
Signed-off-by: tylertitsworth <tyler.titsworth@intel.com>
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-27 23:45:50 -07:00 |
|
youkaichao
|
d86f6b2afb
|
[misc] fix wheel name (#8919)
|
2024-09-27 22:10:44 -07:00 |
|
Sebastian Schoennenbeck
|
bd429f2b75
|
[Core] Priority-based scheduling in async engine (#8850)
|
2024-09-27 15:07:10 -07:00 |
|
youkaichao
|
18e60d7d13
|
[misc][distributed] add VLLM_SKIP_P2P_CHECK flag (#8911)
|
2024-09-27 14:27:56 -07:00 |
|
Varun Sundar Rabindranath
|
c2ec430ab5
|
[Core] Multi-Step + Single Step Prefills via Chunked Prefill code path (#8378)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-09-27 13:32:07 -07:00 |
|
Lucas Wilkinson
|
c5d55356f9
|
[Bugfix] fix for deepseek w4a16 (#8906)
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-09-27 13:12:34 -06:00 |
|
Luka Govedič
|
172d1cd276
|
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271)
|
2024-09-27 14:25:10 -04:00 |
|
youkaichao
|
a9b15c606f
|
[torch.compile] use empty tensor instead of None for profiling (#8875)
|
2024-09-27 08:11:32 -07:00 |
|
Brittany
|
8df2dc3c88
|
[TPU] Update pallas.py to support trillium (#8871)
|
2024-09-27 01:16:55 -07:00 |
|
Isotr0py
|
6d792d2f31
|
[Bugfix][VLM] Fix Fuyu batching inference with max_num_seqs>1 (#8892)
|
2024-09-27 01:15:58 -07:00 |
|
Peter Pan
|
0e088750af
|
[MISC] Fix invalid escape sequence '\' (#8830)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2024-09-27 01:13:25 -07:00 |
|
youkaichao
|
dc4e3df5c2
|
[misc] fix collect env (#8894)
|
2024-09-27 00:26:38 -07:00 |
|
Cyrus Leung
|
3b00b9c26c
|
[Core] renamePromptInputs and inputs (#8876)
|
2024-09-26 20:35:15 -07:00 |
|
Maximilien de Bayser
|
344cd2b6f4
|
[Feature] Add support for Llama 3.1 and 3.2 tool use (#8343)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-09-26 17:01:42 -07:00 |
|
Cyrus Leung
|
1b49148e47
|
[Installation] Allow lower versions of FastAPI to maintain Ray 2.9 compatibility (#8764)
|
2024-09-26 16:54:09 -07:00 |
|
Nick Hill
|
4b377d6feb
|
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829)
|
2024-09-26 16:46:43 -07:00 |
|
Tyler Michael Smith
|
71d21c73ab
|
[Bugfix] Fixup advance_step.cu warning (#8815)
|
2024-09-26 16:23:45 -07:00 |
|
Chirag Jain
|
ee2da3e9ef
|
fix validation: Only set tool_choice auto if at least one tool is provided (#8568)
|
2024-09-26 16:23:17 -07:00 |
|
Tyler Michael Smith
|
e2f6f26e86
|
[Bugfix] Fix print_warning_once's line info (#8867)
|
2024-09-26 16:18:26 -07:00 |
|
Michael Goin
|
b28d2104de
|
[Misc] Change dummy profiling and BOS fallback warns to log once (#8820)
|
2024-09-26 16:18:14 -07:00 |
|
Pernekhan Utemuratov
|
93d364da34
|
[Bugfix] Include encoder prompts len to non-stream api usage response (#8861)
|
2024-09-26 15:47:00 -07:00 |
|
Kevin H. Luu
|
d9cfbc891e
|
[ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (#8872)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-09-26 15:02:16 -07:00 |
|