Commit Graph - vllm - Gitea: Git with a cup of tea

squall/vllm

Fork 0

c055747867

[model][utils] add extract_layer_index utility function (#10599) main youkaichao 2024-11-23 22:22:54 -0800
eda2b3589c

Revert "Print running script to enhance CI log readability" (#10601) youkaichao 2024-11-23 21:31:47 -0800
1c445dca51

[CI/Build] Print running script to enhance CI log readability (#10594) Jee Jee Li 2024-11-24 11:57:13 +0800
1700c543a5

[Bugfix] Fix LoRA weight sharding (#10450) Jee Jee Li 2024-11-24 09:23:17 +0800
17d8fc1806

[bugfix] Fix example/tensorize_vllm_model tests (#10595) Jee Jee Li 2024-11-24 09:22:33 +0800
04668ebe7a

[Bugfix] Avoid import AttentionMetadata explicitly in Mllama (#10593) Isotr0py 2024-11-24 02:12:20 +0800
651f6c31ac

For ppc64le, disabled tests for now and addressed space issues (#10538) Nishidha 2024-11-23 15:03:53 +0530
86a44fb896

[Platforms] Refactor openvino code (#10573) JiHuazhong 2024-11-23 14:23:12 +0800
4cfe5d2bca

[Bugfix] multi_modal_kwargs broadcast for CPU tensor parallel (#10541) Isotr0py 2024-11-23 13:25:46 +0800
c8acd80548

[2/N] handling placeholders in merged multi-modal processor (#10485) Cyrus Leung 2024-11-23 13:25:09 +0800
4634a89d18

Prefix Cache Aware Scheduling [1/n] (#10128) Ricky Xu 2024-11-22 21:15:55 -0800
7c25fe45a6

[AMD] Add support for GGUF quantization on ROCm (#10254) kliuae 2024-11-23 13:14:49 +0800
02a43f82a9

Update default max_num_batch_tokens for chunked prefill to 2048 (#10544) Michael Goin 2024-11-23 00:14:19 -0500
cfea9c04ef

[Model] Fix Baichuan BNB online quantization (#10572) Chen Wu 2024-11-23 13:13:59 +0800
7d8ffb344f

[Bugfix] Internal Server Error when tool_choice is incorrect. (#10567) Varun Vinayak Shenoy 2024-11-22 21:13:29 -0800
4aba6e3d1a

[core] gemma2 full context length support (#10584) youkaichao 2024-11-22 20:13:54 -0800
978b39744b

[Misc] Add pynccl wrappers for all_gather and reduce_scatter (#9432) Tyler Michael Smith 2024-11-22 22:14:03 -0500
ebda51968b

[Core] Fix broken log configuration (#10458) Russell Bryant 2024-11-22 21:23:51 -0500
9195dbdbca

[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use (#10164) Travis Johnson 2024-11-22 19:17:38 -0700
d559979c54

[bugfix] fix cpu tests (#10585) youkaichao 2024-11-22 17:34:03 -0800
d345f409b7

[V1] EngineCore supports profiling (#10564) Zhonghua Deng 2024-11-23 09:16:15 +0800
28598f3939

[Core] remove temporary local variables in LLMEngine.__init__ (#10577) Russell Bryant 2024-11-22 19:22:53 -0500
948c859571

support bitsandbytes quantization with qwen model (#10549) zixuanzhang226 2024-11-22 16:16:14 -0800
97814fbf0f

[v1] Refactor KVCacheManager for more hash input than token ids (#10507) Ricky Xu 2024-11-22 15:27:25 -0800
eebad39f26

[torch.compile] support all attention backends (#10558) youkaichao 2024-11-22 14:04:42 -0800
db100c5cde

[bugfix] fix full graph tests (#10581) youkaichao 2024-11-22 10:02:14 -0800
11fcf0e066

Remove token-adding chat embedding params (#10551) Noam Gat 2024-11-22 09:59:47 +0200
b6374e09b0

[Bugfix] Fix Phi-3 BNB quantization with tensor parallel (#9948) Isotr0py 2024-11-22 15:01:56 +0800
a111d0151f

[platforms] absorb worker cls difference into platforms folder (#10555) youkaichao 2024-11-21 21:00:32 -0800
446c7806b2

[Minor] Fix line-too-long (#10563) Woosuk Kwon 2024-11-21 19:40:40 -0800
33e0a2540a

[9/N] torch.compile LLM usage (#10552) youkaichao 2024-11-21 19:13:31 -0800
aed074860a

[Benchmark] Add new H100 machine (#10547) Simon Mo 2024-11-21 18:27:20 -0800
9afa014552

Add small example to metrics.rst (#10550) Michael Goin 2024-11-21 18:43:43 -0500
46fe9b46d8

[Minor] Revert change in offline inference example (#10545) Woosuk Kwon 2024-11-21 13:28:16 -0800
cf656f5a02

[misc] improve error message (#10553) youkaichao 2024-11-21 13:13:17 -0800
edec3385b6

[CI][Installation] Avoid uploading CUDA 11.8 wheel (#10535) Yunmeng 2024-11-22 05:03:58 +0800
f9310cbd0c

[V1] Fix Compilation config & Enable CUDA graph by default (#10528) Woosuk Kwon 2024-11-21 12:53:39 -0800
7560ae5caf

[8/N] enable cli flag without a space (#10529) youkaichao 2024-11-21 12:30:42 -0800
e7a8341c7c

[Bugfix] Allow token ID-only inputs in Qwen2-Audio (#10536) Cyrus Leung 2024-11-22 02:09:43 +0800
c51e397fe8

[Misc] Suppress duplicated logging regarding multimodal input pipeline (#10530) Roger Wang 2024-11-21 09:21:31 -0800
2385b60d83

[Kernel] Register punica ops directly (#10522) Jee Jee Li 2024-11-22 01:18:11 +0800
da7e702c6f

[Bug]: When apply continue_final_message for OpenAI server, the "echo":false is ignored (#10180) Chauncey 2024-11-22 00:24:32 +0800
4d676f0852

[Bugfix] Embedding model pooling_type equals ALL and multi input's bug (#10494) Xiaoyu Zhang 2024-11-21 22:40:02 +0800
d5ec121f95

[Model] Expose dynamic_image_size as mm_processor_kwargs for InternVL2 models (#10518) Isotr0py 2024-11-21 22:20:08 +0800
8a93a598d9

fix the issue that len(tokenizer(prompt)["input_ids"]) > prompt_len (#10524) Wang, Yi 2024-11-21 19:15:36 +0800
1cfde82ffd

[Model] Add Support for Multimodal Granite Models (#10291) Alex Brooks 2024-11-21 03:46:20 -0700
f0e0238016

[Doc] fix a small typo in docstring of llama_tool_parser (#10513) Zhong Qishuai 2024-11-21 17:05:23 +0800
aaddce5d26

[platforms] improve error message for unspecified platforms (#10520) youkaichao 2024-11-20 23:07:56 -0800
3430857b64

[Misc] Increase default video fetch timeout (#10495) Cyrus Leung 2024-11-21 15:06:42 +0800
8b0fe06c89

[torch.compile] Inductor code caching fix (#10273) Luka Govedič 2024-11-21 00:44:57 -0500
9d827170a3

[Platforms] Add device_type in Platform (#10508) Mengqing Cao 2024-11-21 12:44:20 +0800
6c1208d083

[Core] Add Sliding Window Support with Flashinfer (#10462) Pavani Majety 2024-11-20 19:56:47 -0800
388ee3de66

[torch.compile] limit inductor threads and lazy import quant (#10482) youkaichao 2024-11-20 18:36:33 -0800
2f77b6cfec

[TPU] Implement prefix caching for TPUs (#10307) Woosuk Kwon 2024-11-20 13:54:15 -0800
c68f7ede6a

[Bugfix]: allow extra fields in requests to openai compatible server (#10463) Guillaume Calmettes 2024-11-20 22:42:21 +0100
0cd3d9717e

[7/N] torch.compile, reduce compilation time (#10460) youkaichao 2024-11-20 11:20:38 -0800
5f1d6af2b6

[perf bench] H200 development (#9768) Simon Mo 2024-11-20 11:06:56 -0800
772a66732d

[platforms] restore xpu check for parallel config (#10479) youkaichao 2024-11-20 09:13:28 -0800
63f1fde277

[Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355) Li, Jiang 2024-11-20 18:57:39 +0800
d5b28447e0

[Platforms] Refactor xpu code (#10468) Mengqing Cao 2024-11-20 14:52:13 +0800
09dbf9ff16

[Bugfix] Handle conflicts between modern and legacy fields (#10471) Cyrus Leung 2024-11-20 14:45:08 +0800
343041c4c4

[model] Reduce medusa weight (#10454) Sky Lee 2024-11-20 14:05:55 +0800
ed701ca963

[ci/build] Combine nightly and optional (#10465) Kevin H. Luu 2024-11-19 19:36:03 -1000
7629a9c6e5

[CI/Build] Support compilation with local cutlass path (#10423) (#10424) wchen61 2024-11-20 13:35:50 +0800
709c9f1f25

[CI/Build] Add sphinx/rst linter for docs (#10366) Rafael Vasquez 2024-11-20 00:35:31 -0500
b4be5a8adb

[Bugfix] Enforce no chunked prefill for embedding models (#10470) Cyrus Leung 2024-11-20 13:12:51 +0800
ad44437ba3

[Bugfix] Fix Mamba model initialization and MLP Speculator weights loading (#10456) Isotr0py 2024-11-20 13:04:05 +0800
9e05252b46

[Misc] Add __setitem__ for LazyDict (#10469) Yanyi Liu 2024-11-20 12:44:57 +0800
d200972e7f

[Bugfix] Marlin 2:4 temp fix for large M dim (>256) (#10464) Lucas Wilkinson 2024-11-19 22:40:33 -0500
d5b68aba2f

[CI/Build] Update Dockerfile.rocm (#10434) Alexei-V-Ivanov-AMD 2024-11-19 19:19:59 -0600
a324d3a1a7

Change granite chat template to keep json list formatting for tool calls (#10452) Maximilien de Bayser 2024-11-19 22:16:54 -0300
b00b33d77e

[Model][Quantization] HQQ support through Marlin kernel expansion (#9766) ElizaWszola 2024-11-19 22:31:12 +0100
efa9084628

[Core] Avoid metrics log noise when idle (#8868) Russell Bryant 2024-11-19 16:05:25 -0500
803f37eaaa

[6/N] torch.compile rollout to users (#10437) youkaichao 2024-11-19 10:09:03 -0800
fd9f124971

[Doc] fix link for page that was renamed (#10455) Russell Bryant 2024-11-19 12:48:30 -0500
1ea291a417

Fix: Build error seen on Power Architecture (#10421) Manjul Mohan 2024-11-19 23:04:57 +0530
11fd7ea639

[Pixtral-Large] Pixtral actually has no bias in vision-lang adapter (#10449) Patrick von Platen 2024-11-19 18:33:06 +0100
f028dff33d

[BugFix] Fix hermes tool parser output error stream arguments in some cases (#10395) (#10398) COSMOPlat 2024-11-19 21:42:50 +0800
b4614656b8

[CI][CPU] adding numa node number as container name suffix (#10441) Yuan 2024-11-19 21:16:43 +0800
25f9c78961

[misc][plugin] improve plugin loading (#10443) youkaichao 2024-11-19 02:43:21 -0800
5390d6664f

[Doc] Add the start of an arch overview page (#10368) Russell Bryant 2024-11-19 04:52:11 -0500
382b6a4852

[Misc] Avoid misleading warning messages (#10438) Jee Jee Li 2024-11-19 16:54:58 +0800
272e31c0bd

[Bugfix] Guard for negative counter metrics to prevent crash (#10430) Travis Johnson 2024-11-18 21:57:10 -0700
74f8c2cf5f

Add openai.beta.chat.completions.parse example to structured_outputs.rst (#10433) Michael Goin 2024-11-18 23:37:46 -0500
8c1fb50705

[Platform][Refactor] Extract func get_default_attn_backend to Platform (#10358) Mengqing Cao 2024-11-19 11:22:26 +0800
7eb719df13

[Bugfix]Fix Phi-3 BNB online quantization (#10417) Jee Jee Li 2024-11-19 11:21:42 +0800
284203f171

[ci/build] Have dependabot ignore all patch update (#10436) Kevin H. Luu 2024-11-18 15:04:25 -1000
90a6c759ca

[misc] partial prefix & random input generation benchmark (#9929) Ricky Xu 2024-11-18 15:39:14 -0800
2298e69b5f

[ci][bugfix] fix kernel tests (#10431) youkaichao 2024-11-18 15:29:37 -0800
a03ea40792

[3/N][torch.compile] consolidate custom op logging (#10399) youkaichao 2024-11-18 15:14:59 -0800
96d999fbe8

[Kernel] Initial Machete W4A8 support + Refactors (#9855) Lucas Wilkinson 2024-11-18 14:59:29 -0500
c2170a5b39

[Kernel] Explicitly specify other value in tl.load calls (#9014) Angus Wang 2024-11-18 11:39:40 -0800
6b2d25efc7

[Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107) Yan Ma 2024-11-19 02:18:05 +0800
281cc4b3cd

[Model][Bugfix] Support TP for PixtralHF ViT (#10405) Michael Goin 2024-11-18 13:04:14 -0500
4f686d139f

Fix open_collective value in FUNDING.yml (#10426) Andrew Nesbitt 2024-11-18 17:52:42 +0000
31894a2155

[Doc] Add documentation for Structured Outputs (#9943) ismael-dm 2024-11-18 18:52:12 +0100
7851b45196

[5/N][torch.compile] torch.jit.script --> torch.compile (#10406) youkaichao 2024-11-18 07:20:06 -0800
4186be8111

[Doc] Update doc for LoRA support in GLM-4V (#10425) B-201 2024-11-18 23:08:30 +0800
e7ebb662d7

[Model] Remove transformers attention porting in VITs (#10414) Isotr0py 2024-11-18 21:45:21 +0800
5be4e52b65

[Model][LoRA]LoRA support added for glm-4v (#10418) B-201 2024-11-18 20:57:10 +0800

Commit Graph Select branches Hide Pull Requests main Mono Color

Commit Graph

Select branches

Hide Pull Requests

main