vllm/vllm
2024-08-20 00:26:09 -07:00
..
adapter_commons [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
assets [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
attention [spec decode] [4/N] Move update_flash_attn_metadata to attn backend (#7571) 2024-08-16 11:41:56 -07:00
core [MISC] Add prefix cache hit rate to metrics (#7606) 2024-08-19 11:52:07 -07:00
distributed [core][misc] update libcudart finding (#7620) 2024-08-16 23:01:35 -07:00
engine [Bugfix] use StoreBoolean instead of type=bool for --disable-logprobs-during-spec-decoding (#7665) 2024-08-20 00:43:09 +00:00
entrypoints [ Bugfix ] Fix Prometheus Metrics With zeromq Frontend (#7279) 2024-08-18 20:19:48 +00:00
executor [core] Multi Step Scheduling (#7000) 2024-08-19 13:52:13 -07:00
inputs [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
logging [MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273) 2024-05-01 17:34:40 -07:00
lora [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
model_executor [XPU] fallback to native implementation for xpu custom op (#7670) 2024-08-20 00:26:09 -07:00
multimodal [VLM] Refactor MultiModalConfig initialization and profiling (#7530) 2024-08-17 13:30:55 -07:00
platforms [doc] fix doc build error caused by msgspec (#7659) 2024-08-19 17:50:59 -07:00
plugins [misc][plugin] add plugin system implementation (#7426) 2024-08-13 16:24:17 -07:00
prompt_adapter [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
spec_decode [Speculative Decoding] Fixing hidden states handling in batch expansion (#7508) 2024-08-19 17:58:14 -07:00
transformers_utils [Model] Align nemotron config with final HF state and fix lm-eval-small (#7611) 2024-08-16 15:56:34 -07:00
triton_utils [Kernel][RFC] Refactor the punica kernel based on Triton (#5036) 2024-07-31 17:12:24 -07:00
usage [Misc] Manage HTTP connections in one place (#6600) 2024-07-22 21:32:02 -07:00
worker [TPU] Remove redundant input tensor cloning (#7660) 2024-08-19 15:55:04 -07:00
__init__.py [Frontend] Refactor prompt processing (#4028) 2024-07-22 10:13:53 -07:00
_core_ext.py [Kernel][Misc] dynamo support for ScalarType (#7594) 2024-08-16 13:59:49 -07:00
_custom_ops.py [Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596) 2024-08-16 14:00:11 -07:00
_ipex_ops.py [mypy] Enable following imports for some directories (#6681) 2024-07-31 10:38:03 +08:00
block.py [Performance] Optimize e2e overheads: Reduce python allocations (#7162) 2024-08-08 21:34:28 -07:00
config.py [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
connections.py [core][distributed] fix zmq hang (#6759) 2024-07-24 17:37:12 -07:00
envs.py [Core] Use flashinfer sampling kernel when available (#7137) 2024-08-19 03:24:03 +00:00
logger.py [Bugfix] Don't disable existing loggers (#7664) 2024-08-19 15:11:58 -07:00
outputs.py [Bugfix] Fix weight loading for Chameleon when TP>1 (#7410) 2024-08-13 05:33:41 +00:00
pooling_params.py [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
py.typed Add py.typed so consumers of vLLM can get type checking (#1509) 2023-10-30 14:50:47 -07:00
sampling_params.py [Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
scalar_type.py [Misc] Disambiguate quantized types via a new ScalarType (#6396) 2024-08-02 13:51:58 -07:00
scripts.py [Frontend] Disallow passing model as both argument and option (#7347) 2024-08-12 12:58:34 +00:00
sequence.py [core] Multi Step Scheduling (#7000) 2024-08-19 13:52:13 -07:00
tracing.py [Core] Add span metrics for model_forward, scheduler and sampler time (#7089) 2024-08-09 13:55:13 -07:00
utils.py [VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126) 2024-08-14 17:55:42 +00:00
version.py bump version to v0.5.4 (#7139) 2024-08-05 14:39:48 -07:00