Peter Pan
0e088750af
[MISC] Fix invalid escape sequence '\' ( #8830 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2024-09-27 01:13:25 -07:00
youkaichao
dc4e3df5c2
[misc] fix collect env ( #8894 )
2024-09-27 00:26:38 -07:00
Cyrus Leung
3b00b9c26c
[Core] renamePromptInputs and inputs ( #8876 )
2024-09-26 20:35:15 -07:00
Maximilien de Bayser
344cd2b6f4
[Feature] Add support for Llama 3.1 and 3.2 tool use ( #8343 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-09-26 17:01:42 -07:00
Cyrus Leung
1b49148e47
[Installation] Allow lower versions of FastAPI to maintain Ray 2.9 compatibility ( #8764 )
2024-09-26 16:54:09 -07:00
Nick Hill
4b377d6feb
[BugFix] Fix test breakages from transformers 4.45 upgrade ( #8829 )
2024-09-26 16:46:43 -07:00
Tyler Michael Smith
71d21c73ab
[Bugfix] Fixup advance_step.cu warning ( #8815 )
2024-09-26 16:23:45 -07:00
Chirag Jain
ee2da3e9ef
fix validation: Only set tool_choice auto if at least one tool is provided ( #8568 )
2024-09-26 16:23:17 -07:00
Tyler Michael Smith
e2f6f26e86
[Bugfix] Fix print_warning_once's line info ( #8867 )
2024-09-26 16:18:26 -07:00
Michael Goin
b28d2104de
[Misc] Change dummy profiling and BOS fallback warns to log once ( #8820 )
2024-09-26 16:18:14 -07:00
Pernekhan Utemuratov
93d364da34
[Bugfix] Include encoder prompts len to non-stream api usage response ( #8861 )
2024-09-26 15:47:00 -07:00
Kevin H. Luu
d9cfbc891e
[ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM ( #8872 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-09-26 15:02:16 -07:00
youkaichao
70de39f6b4
[misc][installation] build from source without compilation ( #8818 )
2024-09-26 13:19:04 -07:00
fyuan1316
68988d4e0d
[CI/Build] Fix missing ci dependencies ( #8834 )
2024-09-26 11:04:39 -07:00
Michael Goin
520db4dbc1
[Docs] Add README to the build docker image ( #8825 )
2024-09-26 11:02:52 -07:00
Tyler Michael Smith
f70bccac75
[Build/CI] Upgrade to gcc 10 in the base build Docker image ( #8814 )
2024-09-26 10:07:18 -07:00
Roger Wang
4bb98f2190
[Misc] Update config loading for Qwen2-VL and remove Granite ( #8837 )
2024-09-26 07:45:30 -07:00
Michael Goin
7193774b1f
[Misc] Support quantization of MllamaForCausalLM ( #8822 )
2024-09-25 14:46:22 -07:00
Roger Wang
e2c6e0a829
[Doc] Update doc for Transformers 4.45 ( #8817 )
2024-09-25 13:29:48 -07:00
Chen Zhang
770ec6024f
[Model] Add support for the multi-modal Llama 3.2 model ( #8811 )
...
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-25 13:29:32 -07:00
Simon Mo
4f1ba0844b
Revert "rename PromptInputs and inputs with backward compatibility ( #8760 ) ( #8810 )
2024-09-25 10:36:26 -07:00
Michael Goin
873edda6cf
[Misc] Support FP8 MoE for compressed-tensors ( #8588 )
2024-09-25 09:43:36 -07:00
科英
64840dfae4
[Frontend] MQLLMEngine supports profiling. ( #8761 )
2024-09-25 09:37:41 -07:00
Cyrus Leung
28e1299e60
rename PromptInputs and inputs with backward compatibility ( #8760 )
2024-09-25 09:36:47 -07:00
DefTruth
0c4d2ad5e6
[VLM][Bugfix] internvl with num_scheduler_steps > 1 ( #8614 )
2024-09-25 09:35:53 -07:00
Jee Jee Li
c6f2485c82
[[Misc]] Add extra deps for openai server image ( #8792 )
2024-09-25 09:35:23 -07:00
bnellnm
300da09177
[Kernel] Fullgraph and opcheck tests ( #8479 )
2024-09-25 08:35:52 -06:00
Hongxia Yang
1c046447a6
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade ( #8777 )
2024-09-25 22:26:37 +08:00
Woo-Yeon Lee
8fae5ed7f6
[Misc] Fix minor typo in scheduler ( #8765 )
2024-09-25 00:53:03 -07:00
David Newman
3368c3ab36
[Bugfix] Ray 2.9.x doesn't expose available_resources_per_node ( #8767 )
...
Signed-off-by: darthhexx <darthhexx@gmail.com>
2024-09-25 00:52:26 -07:00
Adam Tilghman
1ac3de09cd
[Frontend] OpenAI server: propagate usage accounting to FastAPI middleware layer ( #8672 )
2024-09-25 07:49:26 +00:00
sohamparikh
3e073e66f1
[Bugfix] load fc bias from config for eagle ( #8790 )
2024-09-24 23:16:30 -07:00
Isotr0py
c23953675f
[Hardware][CPU] Enable mrope and support Qwen2-VL on CPU backend ( #8770 )
2024-09-24 23:16:11 -07:00
zifeitong
e3dd0692fa
[BugFix] Propagate 'trust_remote_code' setting in internvl and minicpmv ( #8250 )
2024-09-25 05:53:43 +00:00
sroy745
fc3afc20df
Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 ( #8752 )
2024-09-24 21:26:36 -07:00
sasha0552
b4522474a3
[Bugfix][Kernel] Implement acquire/release polyfill for Pascal ( #8776 )
2024-09-24 21:26:33 -07:00
sroy745
ee777d9c30
Fix test_schedule_swapped_simple in test_scheduler.py ( #8780 )
2024-09-24 21:26:18 -07:00
Joe Runde
6e0c9d6bd0
[Bugfix] Use heartbeats instead of health checks ( #8583 )
2024-09-24 20:37:38 -07:00
Archit Patke
6da1ab6b41
[Core] Adding Priority Scheduling ( #5958 )
2024-09-24 19:50:50 -07:00
Travis Johnson
01b6f9e1f0
[Core][Bugfix] Support prompt_logprobs returned with speculative decoding ( #8047 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-09-24 17:29:56 -07:00
Jee Jee Li
13f9f7a3d0
[[Misc]Upgrade bitsandbytes to the latest version 0.44.0 ( #8768 )
2024-09-24 17:08:55 -07:00
youkaichao
1e7d5c01f5
[misc] soft drop beam search ( #8763 )
2024-09-24 15:48:39 -07:00
Daniele
2467b642dd
[CI/Build] fix setuptools-scm usage ( #8771 )
2024-09-24 12:38:12 -07:00
Lucas Wilkinson
72fc97a0f1
[Bugfix] Fix torch dynamo fixes caused by replace_parameters ( #8748 )
2024-09-24 14:33:21 -04:00
Andy
2529d09b5a
[Frontend] Batch inference for llm.chat() API ( #8648 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-09-24 09:44:11 -07:00
ElizaWszola
a928ded995
[Kernel] Split Marlin MoE kernels into multiple files ( #8661 )
...
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-09-24 09:31:42 -07:00
Hanzhi Zhou
cc4325b66a
[Bugfix] Fix potentially unsafe custom allreduce synchronization ( #8558 )
2024-09-24 01:08:14 -07:00
Alex Brooks
8ff7ced996
[Model] Expose Phi3v num_crops as a mm_processor_kwarg ( #8658 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-24 07:36:46 +00:00
Peter Salas
3f06bae907
[Core][Model] Support loading weights by ID within models ( #7931 )
2024-09-24 07:14:15 +00:00
Cody Yu
b8747e8a7c
[MISC] Skip dumping inputs when unpicklable ( #8744 )
2024-09-24 06:10:03 +00:00