squall/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Dipika Sikka	fc911880cc	[Kernel] Expand MoE weight loading + Add Fused Marlin MoE Kernel (#7766 ) Co-authored-by: ElizaWszola <eliza@neuralmagic.com>	2024-08-27 15:07:09 -07:00
youkaichao	ed6f002d33	[cuda][misc] error on empty CUDA_VISIBLE_DEVICES (#7924 )	2024-08-27 12:06:11 -07:00
Isotr0py	b09c755be8	[Bugfix] Fix phi3v incorrect image_idx when using async engine (#7916 )	2024-08-27 17:36:09 +00:00
alexeykondrat	42e932c7d4	[CI/Build][ROCm] Enabling tensorizer tests for ROCm (#7237 )	2024-08-27 10:09:13 -07:00
Kunshang Ji	076169f603	[Hardware][Intel GPU] Add intel GPU pipeline parallel support. (#7810 )	2024-08-27 10:07:02 -07:00
Isotr0py	9db642138b	[CI/Build][VLM] Cleanup multiple images inputs model test (#7897 )	2024-08-27 15:28:30 +00:00
Patrick von Platen	6fc4e6e07a	[Model] Add Mistral Tokenization to improve robustness and chat encoding (#7739 )	2024-08-27 12:40:02 +00:00
Cody Yu	9606c7197d	Revert #7509 (#7887 )	2024-08-27 00:16:31 -07:00
youkaichao	64cc644425	[core][torch.compile] discard the compile for profiling (#7796 )	2024-08-26 21:33:58 -07:00
Nick Hill	39178c7fbc	[Tests] Disable retries and use context manager for openai client (#7565 )	2024-08-26 21:33:17 -07:00
Megha Agarwal	2eedede875	[Core] Asynchronous Output Processor (#7049 ) Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>	2024-08-26 20:53:20 -07:00
Dipika Sikka	015e6cc252	[Misc] Update compressed tensors lifecycle to remove `prefix` from `create_weights` (#7825 )	2024-08-26 18:09:34 -06:00
omrishiv	760e9f71a8	[Bugfix] neuron: enable tensor parallelism (#7562 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-08-26 15:13:13 -07:00
youkaichao	05826c887b	[misc] fix custom allreduce p2p cache file generation (#7853 )	2024-08-26 15:02:25 -07:00
Dipika Sikka	dd9857f5fa	[Misc] Update `gptq_marlin_24` to use vLLMParameters (#7762 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-26 17:44:54 -04:00
Dipika Sikka	665304092d	[Misc] Update `qqq` to use vLLMParameters (#7805 )	2024-08-26 13:16:15 -06:00
Cody Yu	2deb029d11	[Performance][BlockManagerV2] Mark prefix cache block as computed after schedule (#7822 )	2024-08-26 11:24:53 -07:00
Cyrus Leung	029c71de11	[CI/Build] Avoid downloading all HF files in `RemoteOpenAIServer` (#7836 )	2024-08-26 05:31:10 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	0b769992ec	[Bugfix]: Use float32 for base64 embedding (#7855 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2024-08-26 03:16:38 +00:00
Nick Hill	1856aff4d6	[Spec Decoding] Streamline batch expansion tensor manipulation (#7851 )	2024-08-25 15:45:14 -07:00
youkaichao	70c094ade6	[misc][cuda] improve pynvml warning (#7852 )	2024-08-25 14:30:09 -07:00
Isotr0py	2059b8d9ca	[Misc] Remove snapshot_download usage in InternVL2 test (#7835 )	2024-08-25 15:53:09 +00:00
Isotr0py	8aaf3d5347	[Model][VLM] Support multi-images inputs for Phi-3-vision models (#7783 )	2024-08-25 11:51:20 +00:00
zifeitong	80162c44b1	[Bugfix] Fix Phi-3v crash when input images are of certain sizes (#7840 )	2024-08-24 18:16:24 -07:00
youkaichao	aab0fcdb63	[ci][test] fix RemoteOpenAIServer (#7838 )	2024-08-24 17:31:28 +00:00
youkaichao	ea9fa160e3	[ci][test] exclude model download time in server start time (#7834 )	2024-08-24 01:03:27 -07:00
youkaichao	7d9ffa2ae1	[misc][core] lazy import outlines (#7831 )	2024-08-24 00:51:38 -07:00
Tyler Rockwood	d81abefd2e	[Frontend] add json_schema support from OpenAI protocol (#7654 )	2024-08-23 23:07:24 -07:00
Pooya Davoodi	8da48e4d95	[Frontend] Publish Prometheus metrics in run_batch API (#7641 )	2024-08-23 23:04:22 -07:00
Pooya Davoodi	6885fde317	[Bugfix] Fix run_batch logger (#7640 )	2024-08-23 13:58:26 -07:00
Alexander Matveev	9db93de20c	[Core] Add multi-step support to LLMEngine (#7789 )	2024-08-23 12:45:53 -07:00
Simon Mo	09c7792610	Bump version to v0.5.5 (#7823 )	2024-08-23 11:35:33 -07:00
Dipika Sikka	f1df5dbfd6	[Misc] Update `marlin` to use vLLMParameters (#7803 )	2024-08-23 14:30:52 -04:00
youkaichao	35ee2ad6b9	[github][misc] promote asking llm first (#7809 )	2024-08-23 09:38:50 -07:00
Maximilien de Bayser	e25fee57c2	[BugFix] Fix server crash on empty prompt (#7746 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2024-08-23 13:12:44 +00:00
Jie Fu (傅杰)	faeddb565d	[misc] Add Torch profiler support for CPU-only devices (#7806 )	2024-08-23 05:46:25 +00:00
Kunshang Ji	fc5ebbd1d3	[Hardware][Intel GPU] refactor xpu_model_runner for tp (#7712 )	2024-08-22 20:06:54 -07:00
SangBin Cho	c01a6cb231	[Ray backend] Better error when pg topology is bad. (#7584 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-08-22 17:44:25 -07:00
Joe Runde	b903e1ba7f	[Frontend] error suppression cleanup (#7786 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-08-22 21:50:21 +00:00
Siyuan Liu	a152246428	[Misc] fix typo in triton import warning (#7794 )	2024-08-22 13:51:23 -07:00
Kevin H. Luu	666ad0aa16	[ci] Cleanup & refactor Dockerfile to pass different Python versions and sccache bucket via build args (#7705 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-22 20:10:55 +00:00
Michael Goin	15310b5101	[Bugfix] Use LoadFormat values for `vllm serve --load-format` (#7784 )	2024-08-22 11:37:08 -07:00
Peter Salas	57792ed469	[Doc] Fix incorrect docs from #7615 (#7788 )	2024-08-22 10:02:06 -07:00
Jiaxin Shan	d3b5b98021	[Misc] Enhance prefix-caching benchmark tool (#6568 )	2024-08-22 09:32:02 -07:00
Travis Johnson	cc0eaf12b1	[Bugfix] spec decode handle None entries in topk args in create_sequence_group_output (#7232 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-08-22 09:33:48 -04:00
Dipika Sikka	955b5191c9	[Misc] update fp8 to use `vLLMParameter` (#7437 )	2024-08-22 08:36:18 -04:00
Lucas Wilkinson	55d63b1211	[Bugfix] Don't build machete on cuda <12.0 (#7757 )	2024-08-22 08:28:52 -04:00
Flex Wang	4f419c00a6	Fix ShardedStateLoader for vllm fp8 quantization (#7708 )	2024-08-22 08:25:04 -04:00
Abhinav Goyal	a3fce56b88	[Speculative Decoding] EAGLE Implementation with Top-1 proposer (#6830 )	2024-08-22 02:42:24 -07:00
Woosuk Kwon	b3856bef7d	[Misc] Use torch.compile for GemmaRMSNorm (#7642 )	2024-08-22 01:14:13 -07:00

1 2 3 4 5 ...

2473 Commits