squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Roger Wang	4506641212	[Doc] Section for Multimodal Language Models (#7719 )	2024-08-20 23:24:01 -07:00
Isotr0py	12e1c65bc9	[Model] Add AWQ quantization support for InternVL2 model (#7187 )	2024-08-20 23:18:57 -07:00
youkaichao	b74a125800	[ci] try to log process using the port to debug the port usage (#7711 )	2024-08-20 17:41:12 -07:00
Antoni Baum	66a9e713a7	[Core] Pipe `worker_class_fn` argument in Executor (#7707 )	2024-08-21 00:37:39 +00:00
youkaichao	9e51b6a626	[ci][test] adjust max wait time for cpu offloading test (#7709 )	2024-08-20 17:12:44 -07:00
Kunshang Ji	6e4658c7aa	[Intel GPU] fix xpu not support punica kernel (which use torch.library.custom_op) (#7685 )	2024-08-20 12:01:09 -07:00
Antoni Baum	3b682179dd	[Core] Add `AttentionState` abstraction (#7663 )	2024-08-20 18:50:45 +00:00
Lucas Wilkinson	c6af027a35	[Misc] Add jinja2 as an explicit build requirement (#7695 )	2024-08-20 17:17:47 +00:00
Ronen Schaffer	2aa00d59ad	[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266 ) [CI/Build] Pin OpenTelemetry versions and make a availability errors clearer (#7266)	2024-08-20 10:02:21 -07:00
Kunshang Ji	c42590f97a	[Hardware] [Intel GPU] refactor xpu worker/executor (#7686 )	2024-08-20 09:54:10 -07:00
Isotr0py	aae6927be0	[VLM][Model] Add test for InternViT vision encoder (#7409 )	2024-08-20 23:10:20 +08:00
Ilya Lavrenov	398521ad19	[OpenVINO] Updated documentation (#7687 )	2024-08-20 07:33:56 -06:00
Lucas Wilkinson	5288c06aa0	[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174 )	2024-08-20 07:09:33 -06:00
Kunshang Ji	b6f99a6ffe	[Core] Refactor executor classes for easier inheritance (#7673 ) [Core] Refactor executor classes to make it easier to inherit GPUExecutor (#7673)	2024-08-20 00:56:50 -07:00
youkaichao	ad28a74beb	[misc][cuda] add warning for pynvml user (#7675 )	2024-08-20 00:35:09 -07:00
jianyizh	e6d811dd13	[XPU] fallback to native implementation for xpu custom op (#7670 )	2024-08-20 00:26:09 -07:00
youkaichao	c4be16e1a7	[misc] add nvidia related library in collect env (#7674 )	2024-08-19 23:22:49 -07:00
Kuntai Du	3d8a5f063d	[CI] Organizing performance benchmark files (#7616 )	2024-08-19 22:43:54 -07:00
Zijian Hu	f4fc7337bf	[Bugfix] support `tie_word_embeddings` for all models (#5724 )	2024-08-19 20:00:04 -07:00
Kevin H. Luu	0df7ec0b2d	[ci] Install Buildkite test suite analysis (#7667 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-19 19:55:04 -07:00
Abhinav Goyal	312f761232	[Speculative Decoding] Fixing hidden states handling in batch expansion (#7508 )	2024-08-19 17:58:14 -07:00
youkaichao	e54ebc2f8f	[doc] fix doc build error caused by msgspec (#7659 )	2024-08-19 17:50:59 -07:00
Travis Johnson	67e02fa8a4	[Bugfix] use StoreBoolean instead of type=bool for --disable-logprobs-during-spec-decoding (#7665 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-08-20 00:43:09 +00:00
Woosuk Kwon	43735bf5e1	[TPU] Remove redundant input tensor cloning (#7660 )	2024-08-19 15:55:04 -07:00
Andrew Song	da115230fd	[Bugfix] Don't disable existing loggers (#7664 )	2024-08-19 15:11:58 -07:00
Isotr0py	7601cb044d	[Core] Support tensor parallelism for GGUF quantization (#7520 )	2024-08-19 17:30:14 -04:00
William Lin	47b65a5508	[core] Multi Step Scheduling (#7000 ) Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>	2024-08-19 13:52:13 -07:00
Ali Panahi	dad961ef5c	[Bugfix] fix lora_dtype value type in arg_utils.py - part 2 (#5428 )	2024-08-19 20:47:00 +00:00
Cody Yu	3ac50b47d0	[MISC] Add prefix cache hit rate to metrics (#7606 )	2024-08-19 11:52:07 -07:00
Woosuk Kwon	df845b2b46	[Misc] Remove Gemma RoPE (#7638 )	2024-08-19 09:29:31 -07:00
Kunshang Ji	1a36287b89	[Bugfix] Fix xpu build (#7644 )	2024-08-18 22:00:09 -07:00
Peng Guanwen	f710fb5265	[Core] Use flashinfer sampling kernel when available (#7137 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-19 03:24:03 +00:00
SangBin Cho	ff7ec82c4d	[Core] Optimize SPMD architecture with delta + serialization optimization (#7109 )	2024-08-18 17:57:20 -07:00
Woosuk Kwon	200a2ffa6b	[Misc] Refactor Llama3 RoPE initialization (#7637 )	2024-08-18 17:18:12 -07:00
Alex Brooks	40e1360bb6	[CI/Build] Add text-only test for Qwen models (#7475 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-08-19 07:43:46 +08:00
Robert Shaw	e3b318216d	[ Bugfix ] Fix Prometheus Metrics With `zeromq` Frontend (#7279 ) Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-18 20:19:48 +00:00
Woosuk Kwon	ab7165f2c7	[TPU] Optimize RoPE forward_native2 (#7636 )	2024-08-18 01:15:10 -07:00
Woosuk Kwon	0c2fa50b84	[TPU] Use mark_dynamic only for dummy run (#7634 )	2024-08-18 00:18:53 -07:00
Woosuk Kwon	ce143353c6	[TPU] Skip creating empty tensor (#7630 )	2024-08-17 14:22:46 -07:00
Roger Wang	bbf55c4805	[VLM] Refactor `MultiModalConfig` initialization and profiling (#7530 )	2024-08-17 13:30:55 -07:00
Jee Jee Li	1ef13cf92f	[Misc]Fix BitAndBytes exception messages (#7626 )	2024-08-17 12:02:14 -07:00
youkaichao	832163b875	[ci][test] allow longer wait time for api server (#7629 )	2024-08-17 11:26:38 -07:00
Besher Alkurdi	e73f76eec6	[Model] Pipeline parallel support for JAIS (#7603 )	2024-08-17 11:11:09 -07:00
youkaichao	d95cc0a55c	[core][misc] update libcudart finding (#7620 ) Co-authored-by: cjackal <44624812+cjackal@users.noreply.github.com>	2024-08-16 23:01:35 -07:00
youkaichao	5bf45db7df	[ci][test] fix engine/logger test (#7621 )	2024-08-16 23:00:59 -07:00
youkaichao	eed020f673	[misc] use nvml to get consistent device name (#7582 )	2024-08-16 21:15:13 -07:00
Xander Johnson	7c0b7ea214	[Bugfix] add >= 1.0 constraint for openai dependency (#7612 )	2024-08-16 20:56:01 -07:00
SangBin Cho	4706eb628e	[aDAG] Unflake aDAG + PP tests (#7600 )	2024-08-16 20:49:30 -07:00
Rui Qiao	bae888cb8e	[Bugfix] Clear engine reference in AsyncEngineRPCServer (#7618 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-16 20:44:05 -07:00
Alexei-V-Ivanov-AMD	6bd19551b0	.[Build/CI] Enabling passing AMD tests. (#7610 )	2024-08-16 20:25:32 -07:00

1 2 3 4 5 ...

2400 Commits