Murali Andoorveedu
5d4d90536f
[Distributed] Add send and recv helpers ( #5719 )
2024-06-23 14:42:28 -07:00
Varun Sundar Rabindranath
6c916ac8a8
[BugFix] [Kernel] Add Cutlass2x fallback kernels ( #5744 )
...
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-06-23 21:07:11 +00:00
youkaichao
832ea88fcb
[core][distributed] improve shared memory broadcast ( #5754 )
2024-06-22 10:00:43 -07:00
Woosuk Kwon
8c00f9c15d
[Docs][TPU] Add installation tip for TPU ( #5761 )
2024-06-21 23:09:40 -07:00
Woosuk Kwon
0cbc1d2b4f
[Bugfix] Fix pin_lora error in TPU executor ( #5760 )
2024-06-21 22:25:14 -07:00
zifeitong
ff9ddbceee
[Misc] Remove #4789 workaround left in vllm/entrypoints/openai/run_batch.py ( #5756 )
2024-06-22 03:33:12 +00:00
Jie Fu (傅杰)
9c62db07ed
[Model] Support Qwen-VL and Qwen-VL-Chat models with text-only inputs ( #5710 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-06-22 02:07:08 +00:00
Kunshang Ji
cf90ae0123
[CI][Hardware][Intel GPU] add Intel GPU(XPU) ci pipeline ( #5616 )
2024-06-21 17:09:34 -07:00
rohithkrn
f5dda63eb5
[LoRA] Add support for pinning lora adapters in the LRU cache ( #5603 )
2024-06-21 15:42:46 -07:00
youkaichao
7187507301
[ci][test] fix ca test in main ( #5746 )
2024-06-21 14:04:26 -07:00
zhyncs
f1e72cc19a
[BugFix] exclude version 1.15.0 for modelscope ( #5668 )
2024-06-21 13:15:48 -06:00
Michael Goin
5b15bde539
[Doc] Documentation on supported hardware for quantization methods ( #5745 )
2024-06-21 12:44:29 -04:00
Roger Wang
bd620b01fb
[Kernel][CPU] Add Quick gelu to CPU ( #5717 )
2024-06-21 06:39:40 +00:00
youkaichao
d9a252bc8e
[Core][Distributed] add shm broadcast ( #5399 )
...
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-06-21 05:12:35 +00:00
Jee Li
67005a07bc
[Bugfix] Add fully sharded layer for QKVParallelLinearWithLora ( #5665 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2024-06-21 04:46:28 +00:00
Chang Su
c35e4a3dd7
[BugFix] Fix test_phi3v.py ( #5725 )
2024-06-21 04:45:34 +00:00
Jinzhen Lin
1f5674218f
[Kernel] Add punica dimension for Qwen2 LoRA ( #5441 )
2024-06-20 17:55:41 -07:00
Joshua Rosenkranz
b12518d3cf
[Model] MLPSpeculator speculative decoding support ( #4947 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Davis Wertheimer <Davis.Wertheimer@ibm.com>
2024-06-20 20:23:12 -04:00
youkaichao
6c5b7af152
[distributed][misc] use fork by default for mp ( #5669 )
2024-06-20 17:06:34 -07:00
Michael Goin
8065a7e220
[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names ( #5718 )
2024-06-20 17:00:13 -06:00
Tyler Michael Smith
3f3b6b2150
[Bugfix] Fix the CUDA version check for FP8 support in the CUTLASS kernels ( #5715 )
2024-06-20 18:36:10 +00:00
Varun Sundar Rabindranath
a7dcc62086
[Kernel] Update Cutlass int8 kernel configs for SM80 ( #5275 )
...
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-06-20 13:33:21 +00:00
Roger Wang
ad137cd111
[Model] Port over CLIPVisionModel for VLMs ( #5591 )
2024-06-20 11:52:09 +00:00
Varun Sundar Rabindranath
111af1fa2c
[Kernel] Update Cutlass int8 kernel configs for SM90 ( #5514 )
...
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-06-20 06:37:08 +00:00
Roger Wang
1b2eaac316
[Bugfix][Doc] FIx Duplicate Explicit Target Name Errors ( #5703 )
2024-06-19 23:10:47 -07:00
Cyrus Leung
3730a1c832
[Misc] Improve conftest ( #5681 )
2024-06-19 19:09:21 -07:00
Kevin H. Luu
949e49a685
[ci] Limit num gpus if specified for A100 ( #5694 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-19 16:30:03 -07:00
Dipika Sikka
4a30d7e3cc
[Misc] Add per channel support for static activation quantization; update w8a8 schemes to share base classes ( #5650 )
2024-06-19 18:06:44 -04:00
Rafael Vasquez
e83db9e7e3
[Doc] Update docker references ( #5614 )
...
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-06-19 15:01:45 -07:00
zifeitong
78687504f7
[Bugfix] AsyncLLMEngine hangs with asyncio.run ( #5654 )
2024-06-19 13:57:12 -07:00
youkaichao
d571ca0108
[ci][distributed] add tests for custom allreduce ( #5689 )
2024-06-19 20:16:04 +00:00
Michael Goin
afed90a034
[Frontend][Bugfix] Fix preemption_mode -> preemption-mode for CLI arg in arg_utils.py ( #5688 )
2024-06-19 14:41:42 -04:00
Kevin H. Luu
3ee5c4bca5
[ci] Add A100 queue into AWS CI template ( #5648 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-19 08:42:13 -06:00
Cyrus Leung
e9c2732b97
[CI/Build] Add tqdm to dependencies ( #5680 )
2024-06-19 08:37:33 -06:00
DearPlanet
d8714530d1
[Misc]Add param max-model-len in benchmark_latency.py ( #5629 )
2024-06-19 18:19:08 +08:00
Isotr0py
7d46c8d378
[Bugfix] Fix sampling_params passed incorrectly in Phi3v example ( #5684 )
2024-06-19 17:58:32 +08:00
Michael Goin
da971ec7a5
[Model] Add FP8 kv cache for Qwen2 ( #5656 )
2024-06-19 09:38:26 +00:00
youkaichao
3eea74889f
[misc][distributed] use 127.0.0.1 for single-node ( #5619 )
2024-06-19 08:05:00 +00:00
Hongxia Yang
f758aed0e8
[Bugfix][CI/Build][AMD][ROCm]Fixed the cmake build bug which generate garbage on certain devices ( #5641 )
2024-06-18 23:21:29 -07:00
Thomas Parnell
e5150f2c28
[Bugfix] Added test for sampling repetition penalty bug. ( #5659 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-06-19 06:03:55 +00:00
Shukant Pal
59a1eb59c9
[Bugfix] Fix Phi-3 Long RoPE scaling implementation ( #5628 )
2024-06-19 01:46:38 +00:00
Tyler Michael Smith
6820724e51
[Bugfix] Fix w8a8 benchmarks for int8 case ( #5643 )
2024-06-19 00:33:25 +00:00
Tyler Michael Smith
b23ce92032
[Bugfix] Fix CUDA version check for mma warning suppression ( #5642 )
2024-06-18 23:48:49 +00:00
milo157
2bd231a7b7
[Doc] Added cerebrium as Integration option ( #5553 )
2024-06-18 15:56:59 -07:00
Thomas Parnell
8a173382c8
[Bugfix] Fix for inconsistent behaviour related to sampling and repetition penalties ( #5639 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-06-18 14:18:37 -07:00
sergey-tinkoff
07feecde1a
[Model] LoRA support added for command-r ( #5178 )
2024-06-18 11:01:21 -07:00
Kevin H. Luu
19091efc44
[ci] Setup Release pipeline and build release wheels with cache ( #5610 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-18 11:00:36 -07:00
Dipika Sikka
95db455e7f
[Misc] Add channel-wise quantization support for w8a8 dynamic per token activation quantization ( #5542 )
2024-06-18 12:45:05 -04:00
Ronen Schaffer
7879f24dcc
[Misc] Add OpenTelemetry support ( #4687 )
...
This PR adds basic support for OpenTelemetry distributed tracing.
It includes changes to enable tracing functionality and improve monitoring capabilities.
I've also added a markdown with print-screens to guide users how to use this feature. You can find it here
2024-06-19 01:17:03 +09:00
Kevin H. Luu
13db4369d9
[ci] Deprecate original CI template ( #5624 )
...
Signed-off-by: kevin <kevin@anyscale.com>
2024-06-18 14:26:20 +00:00