xwjiang2010
|
74d55c065b
|
[VLM][BugFix] Make sure that multi_modal_kwargs can broadcast properly with ring buffer. (#5905)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-06-28 07:29:13 +00:00 |
|
Woosuk Kwon
|
f136da15e1
|
[Hardware][TPU] Optimize KV cache swapping (#5878)
|
2024-06-27 21:12:13 -07:00 |
|
Divakar Verma
|
c3dde367f1
|
[Kernel][ROCm][AMD] fused_moe Triton configs v2 for mi300X (#5932)
|
2024-06-27 13:41:08 -07:00 |
|
youkaichao
|
64e8d2a783
|
[core][misc] remove logical block (#5882)
|
2024-06-27 13:34:55 -07:00 |
|
Woosuk Kwon
|
79c92c7c8a
|
[Model] Add Gemma 2 (#5908)
|
2024-06-27 13:33:56 -07:00 |
|
Roger Wang
|
736ed38849
|
[CI/Build] Fix Args for _get_logits_warper in Sampler Test (#5922)
|
2024-06-27 11:43:04 -07:00 |
|
Nick Hill
|
365791ff81
|
[BugFix] Fix min_tokens behaviour for multiple eos tokens (#5849)
|
2024-06-27 11:31:11 -07:00 |
|
Nick Hill
|
691e29ecf3
|
[BugFix] Fix MLPSpeculator handling of num_speculative_tokens (#5876)
|
2024-06-27 10:59:33 -07:00 |
|
youkaichao
|
3fd02bda51
|
[doc][misc] add note for Kubernetes users (#5916)
|
2024-06-27 10:07:07 -07:00 |
|
Cyrus Leung
|
98cf2ed678
|
[Model][Bugfix] Implicit model flags and reenable Phi-3-Vision (#5896)
|
2024-06-27 09:08:10 -07:00 |
|
Cyrus Leung
|
e9d32d077d
|
[CI/Build] [1/3] Reorganize entrypoints tests (#5526)
|
2024-06-27 12:43:17 +00:00 |
|
Roger Wang
|
2061f0b8a7
|
[Bugfix] Fix img_sizes Parsing in Phi3-Vision (#5888)
|
2024-06-27 08:29:24 +00:00 |
|
Cyrus Leung
|
96354d6a29
|
[Model] Add base class for LoRA-supported models (#5018)
|
2024-06-27 16:03:04 +08:00 |
|
xwjiang2010
|
d12af207d2
|
[VLM][Bugfix] Make sure that multi_modal_kwargs is broadcasted properly (#5880)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
|
2024-06-27 15:15:24 +08:00 |
|
Cyrus Leung
|
6eabc6cb0e
|
[Doc] Add note about context length in Phi-3-Vision example (#5887)
|
2024-06-26 23:20:01 -07:00 |
|
Nick Hill
|
2110557dab
|
[BugFix] Fix cuda graph for MLPSpeculator (#5875)
Co-authored-by: Abhinav Goyal <abhinav.goyal@flipkart.com>
|
2024-06-27 04:12:10 +00:00 |
|
Roger Wang
|
b9e84259e9
|
[Misc] Add example for LLaVA-NeXT (#5879)
|
2024-06-26 17:57:16 -07:00 |
|
youkaichao
|
294104c3f9
|
[doc] update usage of env var to avoid conflict (#5873)
|
2024-06-26 17:57:12 -04:00 |
|
Chip Kerchner
|
38a1674abb
|
Support CPU inference with VSX PowerPC ISA (#5652)
|
2024-06-26 21:53:04 +00:00 |
|
Woosuk Kwon
|
f5c8628fdc
|
[Bugfix][TPU] Fix CPU cache allocation (#5869)
|
2024-06-26 13:42:40 -07:00 |
|
Woosuk Kwon
|
cbc53b6b8d
|
[Hardware][TPU] Support parallel sampling & Swapping (#5855)
|
2024-06-26 11:07:49 -07:00 |
|
sasha0552
|
c54269d967
|
[Frontend] Add tokenize/detokenize endpoints (#5054)
|
2024-06-26 16:54:22 +00:00 |
|
Luka Govedič
|
5bfd1bbc98
|
[Kernel] Adding bias epilogue support for cutlass_scaled_mm (#5560)
Co-authored-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2024-06-26 15:16:00 +00:00 |
|
Cyrus Leung
|
6984c02a27
|
[CI/Build] Refactor image test assets (#5821)
|
2024-06-26 01:02:34 -07:00 |
|
Woosuk Kwon
|
3439c5a8e3
|
[Bugfix][TPU] Fix KV cache size calculation (#5860)
|
2024-06-26 00:58:23 -07:00 |
|
Woosuk Kwon
|
6806998bf9
|
[Bugfix] Fix embedding to support 2D inputs (#5829)
|
2024-06-26 00:15:22 -07:00 |
|
youkaichao
|
515080ad2f
|
[bugfix][distributed] fix shm broadcast when the queue size is full (#5801)
|
2024-06-25 21:56:02 -07:00 |
|
Roger Wang
|
3aa7b6cf66
|
[Misc][Doc] Add Example of using OpenAI Server with VLM (#5832)
|
2024-06-25 20:34:25 -07:00 |
|
Stephanie Wang
|
dda4811591
|
[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408)
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie <swang@anyscale.com>
Co-authored-by: Stephanie <swang@anyscale.com>
|
2024-06-25 20:30:03 -07:00 |
|
aws-patlange
|
82079729cc
|
[Bugfix] Fix assertion in NeuronExecutor (#5841)
|
2024-06-25 19:52:10 -07:00 |
|
Thomas Parnell
|
c2a8ac75e0
|
[CI/Build] Add E2E tests for MLPSpeculator (#5791)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-06-26 00:04:08 +00:00 |
|
Woosuk Kwon
|
f178e56c68
|
[Hardware][TPU] Raise errors for unsupported sampling params (#5850)
|
2024-06-25 16:58:23 -07:00 |
|
Matt Wong
|
dd793d1de5
|
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422)
|
2024-06-25 15:56:15 -07:00 |
|
Woosuk Kwon
|
bc34937d68
|
[Hardware][TPU] Refactor TPU backend (#5831)
|
2024-06-25 15:25:52 -07:00 |
|
Dipika Sikka
|
dd248f7675
|
[Misc] Update w4a16 compressed-tensors support to include w8a16 (#5794)
|
2024-06-25 19:23:35 +00:00 |
|
Michael Goin
|
d9b34baedd
|
[CI/Build] Add unit testing for FlexibleArgumentParser (#5798)
|
2024-06-25 12:18:03 -07:00 |
|
youkaichao
|
c18ebfdd71
|
[doc][distributed] add both gloo and nccl tests (#5834)
|
2024-06-25 15:10:28 -04:00 |
|
Antoni Baum
|
67882dbb44
|
[Core] Add fault tolerance for RayTokenizerGroupPool (#5748)
|
2024-06-25 10:15:10 -07:00 |
|
Jie Fu (傅杰)
|
7b99314301
|
[Misc] Remove useless code in cpu_worker (#5824)
|
2024-06-25 09:41:36 -07:00 |
|
Woo-Yeon Lee
|
2ce5d6688b
|
[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414)
|
2024-06-25 09:56:06 +00:00 |
|
Cyrus Leung
|
f23871e9ee
|
[Doc] Add notice about breaking changes to VLMs (#5818)
|
2024-06-25 01:25:03 -07:00 |
|
Kevin H. Luu
|
e9de9dd551
|
[ci] Remove aws template (#5757)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-06-24 21:09:02 -07:00 |
|
Chang Su
|
ba991d5c84
|
[Bugfix] Fix FlexibleArgumentParser replaces _ with - for actual args (#5795)
|
2024-06-24 17:01:19 -06:00 |
|
Michael Goin
|
1744cc99ba
|
[Doc] Add Phi-3-medium to list of supported models (#5788)
|
2024-06-24 10:48:55 -07:00 |
|
Michael Goin
|
e72dc6cb35
|
[Doc] Add "Suggest edit" button to doc pages (#5789)
|
2024-06-24 10:26:17 -07:00 |
|
youkaichao
|
c246212952
|
[doc][faq] add warning to download models for every nodes (#5783)
|
2024-06-24 15:37:42 +08:00 |
|
Isotr0py
|
edd5fe5fa2
|
[Bugfix] Add phi3v resize for dynamic shape and fix torchvision requirement (#5772)
|
2024-06-24 12:11:53 +08:00 |
|
Murali Andoorveedu
|
5d4d90536f
|
[Distributed] Add send and recv helpers (#5719)
|
2024-06-23 14:42:28 -07:00 |
|
Varun Sundar Rabindranath
|
6c916ac8a8
|
[BugFix] [Kernel] Add Cutlass2x fallback kernels (#5744)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-06-23 21:07:11 +00:00 |
|
youkaichao
|
832ea88fcb
|
[core][distributed] improve shared memory broadcast (#5754)
|
2024-06-22 10:00:43 -07:00 |
|