Commit Graph

1738 Commits

Author SHA1 Message Date
Ilya Lavrenov
57f09a419c
[Hardware][Intel] OpenVINO vLLM backend (#5379) 2024-06-28 13:50:16 +00:00
Tyler Michael Smith
5932634409
Unmark fused_moe config json file as executable (#5960) 2024-06-28 06:36:12 -07:00
Cyrus Leung
5cbe8d155c
[Core] Registry for processing model inputs (#5214)
Co-authored-by: ywang96 <ywang@roblox.com>
2024-06-28 12:09:56 +00:00
Isotr0py
0d0e3a42ac
[Bugfix][Hardware][Intel CPU] Fix unpassed multi_modal_kwargs for CPU runner (#5956) 2024-06-28 12:03:41 +00:00
xwjiang2010
74d55c065b
[VLM][BugFix] Make sure that multi_modal_kwargs can broadcast properly with ring buffer. (#5905)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-28 07:29:13 +00:00
Woosuk Kwon
f136da15e1
[Hardware][TPU] Optimize KV cache swapping (#5878) 2024-06-27 21:12:13 -07:00
Divakar Verma
c3dde367f1
[Kernel][ROCm][AMD] fused_moe Triton configs v2 for mi300X (#5932) 2024-06-27 13:41:08 -07:00
youkaichao
64e8d2a783
[core][misc] remove logical block (#5882) 2024-06-27 13:34:55 -07:00
Woosuk Kwon
79c92c7c8a
[Model] Add Gemma 2 (#5908) 2024-06-27 13:33:56 -07:00
Roger Wang
736ed38849
[CI/Build] Fix Args for _get_logits_warper in Sampler Test (#5922) 2024-06-27 11:43:04 -07:00
Nick Hill
365791ff81
[BugFix] Fix min_tokens behaviour for multiple eos tokens (#5849) 2024-06-27 11:31:11 -07:00
Nick Hill
691e29ecf3
[BugFix] Fix MLPSpeculator handling of num_speculative_tokens (#5876) 2024-06-27 10:59:33 -07:00
youkaichao
3fd02bda51
[doc][misc] add note for Kubernetes users (#5916) 2024-06-27 10:07:07 -07:00
Cyrus Leung
98cf2ed678
[Model][Bugfix] Implicit model flags and reenable Phi-3-Vision (#5896) 2024-06-27 09:08:10 -07:00
Cyrus Leung
e9d32d077d
[CI/Build] [1/3] Reorganize entrypoints tests (#5526) 2024-06-27 12:43:17 +00:00
Roger Wang
2061f0b8a7
[Bugfix] Fix img_sizes Parsing in Phi3-Vision (#5888) 2024-06-27 08:29:24 +00:00
Cyrus Leung
96354d6a29
[Model] Add base class for LoRA-supported models (#5018) 2024-06-27 16:03:04 +08:00
xwjiang2010
d12af207d2
[VLM][Bugfix] Make sure that multi_modal_kwargs is broadcasted properly (#5880)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
2024-06-27 15:15:24 +08:00
Cyrus Leung
6eabc6cb0e
[Doc] Add note about context length in Phi-3-Vision example (#5887) 2024-06-26 23:20:01 -07:00
Nick Hill
2110557dab
[BugFix] Fix cuda graph for MLPSpeculator (#5875)
Co-authored-by: Abhinav Goyal <abhinav.goyal@flipkart.com>
2024-06-27 04:12:10 +00:00
Roger Wang
b9e84259e9
[Misc] Add example for LLaVA-NeXT (#5879) 2024-06-26 17:57:16 -07:00
youkaichao
294104c3f9
[doc] update usage of env var to avoid conflict (#5873) 2024-06-26 17:57:12 -04:00
Chip Kerchner
38a1674abb
Support CPU inference with VSX PowerPC ISA (#5652) 2024-06-26 21:53:04 +00:00
Woosuk Kwon
f5c8628fdc
[Bugfix][TPU] Fix CPU cache allocation (#5869) 2024-06-26 13:42:40 -07:00
Woosuk Kwon
cbc53b6b8d
[Hardware][TPU] Support parallel sampling & Swapping (#5855) 2024-06-26 11:07:49 -07:00
sasha0552
c54269d967
[Frontend] Add tokenize/detokenize endpoints (#5054) 2024-06-26 16:54:22 +00:00
Luka Govedič
5bfd1bbc98
[Kernel] Adding bias epilogue support for cutlass_scaled_mm (#5560)
Co-authored-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2024-06-26 15:16:00 +00:00
Cyrus Leung
6984c02a27
[CI/Build] Refactor image test assets (#5821) 2024-06-26 01:02:34 -07:00
Woosuk Kwon
3439c5a8e3
[Bugfix][TPU] Fix KV cache size calculation (#5860) 2024-06-26 00:58:23 -07:00
Woosuk Kwon
6806998bf9
[Bugfix] Fix embedding to support 2D inputs (#5829) 2024-06-26 00:15:22 -07:00
youkaichao
515080ad2f
[bugfix][distributed] fix shm broadcast when the queue size is full (#5801) 2024-06-25 21:56:02 -07:00
Roger Wang
3aa7b6cf66
[Misc][Doc] Add Example of using OpenAI Server with VLM (#5832) 2024-06-25 20:34:25 -07:00
Stephanie Wang
dda4811591
[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408)
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie <swang@anyscale.com>
Co-authored-by: Stephanie <swang@anyscale.com>
2024-06-25 20:30:03 -07:00
aws-patlange
82079729cc
[Bugfix] Fix assertion in NeuronExecutor (#5841) 2024-06-25 19:52:10 -07:00
Thomas Parnell
c2a8ac75e0
[CI/Build] Add E2E tests for MLPSpeculator (#5791)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-06-26 00:04:08 +00:00
Woosuk Kwon
f178e56c68
[Hardware][TPU] Raise errors for unsupported sampling params (#5850) 2024-06-25 16:58:23 -07:00
Matt Wong
dd793d1de5
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422) 2024-06-25 15:56:15 -07:00
Woosuk Kwon
bc34937d68
[Hardware][TPU] Refactor TPU backend (#5831) 2024-06-25 15:25:52 -07:00
Dipika Sikka
dd248f7675
[Misc] Update w4a16 compressed-tensors support to include w8a16 (#5794) 2024-06-25 19:23:35 +00:00
Michael Goin
d9b34baedd
[CI/Build] Add unit testing for FlexibleArgumentParser (#5798) 2024-06-25 12:18:03 -07:00
youkaichao
c18ebfdd71
[doc][distributed] add both gloo and nccl tests (#5834) 2024-06-25 15:10:28 -04:00
Antoni Baum
67882dbb44
[Core] Add fault tolerance for RayTokenizerGroupPool (#5748) 2024-06-25 10:15:10 -07:00
Jie Fu (傅杰)
7b99314301
[Misc] Remove useless code in cpu_worker (#5824) 2024-06-25 09:41:36 -07:00
Woo-Yeon Lee
2ce5d6688b
[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414) 2024-06-25 09:56:06 +00:00
Cyrus Leung
f23871e9ee
[Doc] Add notice about breaking changes to VLMs (#5818) 2024-06-25 01:25:03 -07:00
Kevin H. Luu
e9de9dd551
[ci] Remove aws template (#5757)
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-24 21:09:02 -07:00
Chang Su
ba991d5c84
[Bugfix] Fix FlexibleArgumentParser replaces _ with - for actual args (#5795) 2024-06-24 17:01:19 -06:00
Michael Goin
1744cc99ba
[Doc] Add Phi-3-medium to list of supported models (#5788) 2024-06-24 10:48:55 -07:00
Michael Goin
e72dc6cb35
[Doc] Add "Suggest edit" button to doc pages (#5789) 2024-06-24 10:26:17 -07:00
youkaichao
c246212952
[doc][faq] add warning to download models for every nodes (#5783) 2024-06-24 15:37:42 +08:00