Cyrus Leung
|
6eabc6cb0e
|
[Doc] Add note about context length in Phi-3-Vision example (#5887)
|
2024-06-26 23:20:01 -07:00 |
|
Nick Hill
|
2110557dab
|
[BugFix] Fix cuda graph for MLPSpeculator (#5875)
Co-authored-by: Abhinav Goyal <abhinav.goyal@flipkart.com>
|
2024-06-27 04:12:10 +00:00 |
|
Roger Wang
|
b9e84259e9
|
[Misc] Add example for LLaVA-NeXT (#5879)
|
2024-06-26 17:57:16 -07:00 |
|
youkaichao
|
294104c3f9
|
[doc] update usage of env var to avoid conflict (#5873)
|
2024-06-26 17:57:12 -04:00 |
|
Chip Kerchner
|
38a1674abb
|
Support CPU inference with VSX PowerPC ISA (#5652)
|
2024-06-26 21:53:04 +00:00 |
|
Woosuk Kwon
|
f5c8628fdc
|
[Bugfix][TPU] Fix CPU cache allocation (#5869)
|
2024-06-26 13:42:40 -07:00 |
|
Woosuk Kwon
|
cbc53b6b8d
|
[Hardware][TPU] Support parallel sampling & Swapping (#5855)
|
2024-06-26 11:07:49 -07:00 |
|
sasha0552
|
c54269d967
|
[Frontend] Add tokenize/detokenize endpoints (#5054)
|
2024-06-26 16:54:22 +00:00 |
|
Luka Govedič
|
5bfd1bbc98
|
[Kernel] Adding bias epilogue support for cutlass_scaled_mm (#5560)
Co-authored-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2024-06-26 15:16:00 +00:00 |
|
Cyrus Leung
|
6984c02a27
|
[CI/Build] Refactor image test assets (#5821)
|
2024-06-26 01:02:34 -07:00 |
|
Woosuk Kwon
|
3439c5a8e3
|
[Bugfix][TPU] Fix KV cache size calculation (#5860)
|
2024-06-26 00:58:23 -07:00 |
|
Woosuk Kwon
|
6806998bf9
|
[Bugfix] Fix embedding to support 2D inputs (#5829)
|
2024-06-26 00:15:22 -07:00 |
|
youkaichao
|
515080ad2f
|
[bugfix][distributed] fix shm broadcast when the queue size is full (#5801)
|
2024-06-25 21:56:02 -07:00 |
|
Roger Wang
|
3aa7b6cf66
|
[Misc][Doc] Add Example of using OpenAI Server with VLM (#5832)
|
2024-06-25 20:34:25 -07:00 |
|
Stephanie Wang
|
dda4811591
|
[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408)
Signed-off-by: Stephanie Wang <swang@cs.berkeley.edu>
Signed-off-by: Stephanie <swang@anyscale.com>
Co-authored-by: Stephanie <swang@anyscale.com>
|
2024-06-25 20:30:03 -07:00 |
|
aws-patlange
|
82079729cc
|
[Bugfix] Fix assertion in NeuronExecutor (#5841)
|
2024-06-25 19:52:10 -07:00 |
|
Thomas Parnell
|
c2a8ac75e0
|
[CI/Build] Add E2E tests for MLPSpeculator (#5791)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-06-26 00:04:08 +00:00 |
|
Woosuk Kwon
|
f178e56c68
|
[Hardware][TPU] Raise errors for unsupported sampling params (#5850)
|
2024-06-25 16:58:23 -07:00 |
|
Matt Wong
|
dd793d1de5
|
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422)
|
2024-06-25 15:56:15 -07:00 |
|
Woosuk Kwon
|
bc34937d68
|
[Hardware][TPU] Refactor TPU backend (#5831)
|
2024-06-25 15:25:52 -07:00 |
|
Dipika Sikka
|
dd248f7675
|
[Misc] Update w4a16 compressed-tensors support to include w8a16 (#5794)
|
2024-06-25 19:23:35 +00:00 |
|
Michael Goin
|
d9b34baedd
|
[CI/Build] Add unit testing for FlexibleArgumentParser (#5798)
|
2024-06-25 12:18:03 -07:00 |
|
youkaichao
|
c18ebfdd71
|
[doc][distributed] add both gloo and nccl tests (#5834)
|
2024-06-25 15:10:28 -04:00 |
|
Antoni Baum
|
67882dbb44
|
[Core] Add fault tolerance for RayTokenizerGroupPool (#5748)
|
2024-06-25 10:15:10 -07:00 |
|
Jie Fu (傅杰)
|
7b99314301
|
[Misc] Remove useless code in cpu_worker (#5824)
|
2024-06-25 09:41:36 -07:00 |
|
Woo-Yeon Lee
|
2ce5d6688b
|
[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414)
|
2024-06-25 09:56:06 +00:00 |
|
Cyrus Leung
|
f23871e9ee
|
[Doc] Add notice about breaking changes to VLMs (#5818)
|
2024-06-25 01:25:03 -07:00 |
|
Kevin H. Luu
|
e9de9dd551
|
[ci] Remove aws template (#5757)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-06-24 21:09:02 -07:00 |
|
Chang Su
|
ba991d5c84
|
[Bugfix] Fix FlexibleArgumentParser replaces _ with - for actual args (#5795)
|
2024-06-24 17:01:19 -06:00 |
|
Michael Goin
|
1744cc99ba
|
[Doc] Add Phi-3-medium to list of supported models (#5788)
|
2024-06-24 10:48:55 -07:00 |
|
Michael Goin
|
e72dc6cb35
|
[Doc] Add "Suggest edit" button to doc pages (#5789)
|
2024-06-24 10:26:17 -07:00 |
|
youkaichao
|
c246212952
|
[doc][faq] add warning to download models for every nodes (#5783)
|
2024-06-24 15:37:42 +08:00 |
|
Isotr0py
|
edd5fe5fa2
|
[Bugfix] Add phi3v resize for dynamic shape and fix torchvision requirement (#5772)
|
2024-06-24 12:11:53 +08:00 |
|
Murali Andoorveedu
|
5d4d90536f
|
[Distributed] Add send and recv helpers (#5719)
|
2024-06-23 14:42:28 -07:00 |
|
Varun Sundar Rabindranath
|
6c916ac8a8
|
[BugFix] [Kernel] Add Cutlass2x fallback kernels (#5744)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-06-23 21:07:11 +00:00 |
|
youkaichao
|
832ea88fcb
|
[core][distributed] improve shared memory broadcast (#5754)
|
2024-06-22 10:00:43 -07:00 |
|
Woosuk Kwon
|
8c00f9c15d
|
[Docs][TPU] Add installation tip for TPU (#5761)
|
2024-06-21 23:09:40 -07:00 |
|
Woosuk Kwon
|
0cbc1d2b4f
|
[Bugfix] Fix pin_lora error in TPU executor (#5760)
|
2024-06-21 22:25:14 -07:00 |
|
zifeitong
|
ff9ddbceee
|
[Misc] Remove #4789 workaround left in vllm/entrypoints/openai/run_batch.py (#5756)
|
2024-06-22 03:33:12 +00:00 |
|
Jie Fu (傅杰)
|
9c62db07ed
|
[Model] Support Qwen-VL and Qwen-VL-Chat models with text-only inputs (#5710)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-06-22 02:07:08 +00:00 |
|
Kunshang Ji
|
cf90ae0123
|
[CI][Hardware][Intel GPU] add Intel GPU(XPU) ci pipeline (#5616)
|
2024-06-21 17:09:34 -07:00 |
|
rohithkrn
|
f5dda63eb5
|
[LoRA] Add support for pinning lora adapters in the LRU cache (#5603)
|
2024-06-21 15:42:46 -07:00 |
|
youkaichao
|
7187507301
|
[ci][test] fix ca test in main (#5746)
|
2024-06-21 14:04:26 -07:00 |
|
zhyncs
|
f1e72cc19a
|
[BugFix] exclude version 1.15.0 for modelscope (#5668)
|
2024-06-21 13:15:48 -06:00 |
|
Michael Goin
|
5b15bde539
|
[Doc] Documentation on supported hardware for quantization methods (#5745)
|
2024-06-21 12:44:29 -04:00 |
|
Roger Wang
|
bd620b01fb
|
[Kernel][CPU] Add Quick gelu to CPU (#5717)
|
2024-06-21 06:39:40 +00:00 |
|
youkaichao
|
d9a252bc8e
|
[Core][Distributed] add shm broadcast (#5399)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-06-21 05:12:35 +00:00 |
|
Jee Li
|
67005a07bc
|
[Bugfix] Add fully sharded layer for QKVParallelLinearWithLora (#5665)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2024-06-21 04:46:28 +00:00 |
|
Chang Su
|
c35e4a3dd7
|
[BugFix] Fix test_phi3v.py (#5725)
|
2024-06-21 04:45:34 +00:00 |
|
Jinzhen Lin
|
1f5674218f
|
[Kernel] Add punica dimension for Qwen2 LoRA (#5441)
|
2024-06-20 17:55:41 -07:00 |
|