vllm/vllm
2024-06-29 21:04:20 +08:00
..
attention [Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (#4628) 2024-06-28 15:28:49 -07:00
core [core][misc] remove logical block (#5882) 2024-06-27 13:34:55 -07:00
distributed [Distributed] Make it clear that % should not be in tensor dict keys. (#5927) 2024-06-28 15:20:22 +00:00
engine [Bugfix] Support eos_token_id from config.json (#5954) 2024-06-29 11:19:02 +00:00
entrypoints [Bugfix] Fix Engine Failing After Invalid Request - AsyncEngineDeadError (#5963) 2024-06-28 17:46:30 -04:00
executor [Hardware][Intel] OpenVINO vLLM backend (#5379) 2024-06-28 13:50:16 +00:00
inputs [Core] Registry for processing model inputs (#5214) 2024-06-28 12:09:56 +00:00
logging [MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273) 2024-05-01 17:34:40 -07:00
lora [Model] Add Gemma 2 (#5908) 2024-06-27 13:33:56 -07:00
model_executor [Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k (#5939) 2024-06-29 21:04:20 +08:00
multimodal [Core] Registry for processing model inputs (#5214) 2024-06-28 12:09:56 +00:00
spec_decode [Spec Decode] Introduce DraftModelRunner (#5799) 2024-06-28 09:17:51 -07:00
transformers_utils [Bugfix] Support eos_token_id from config.json (#5954) 2024-06-29 11:19:02 +00:00
usage [Misc] Add vLLM version getter to utils (#5098) 2024-06-13 11:21:39 -07:00
worker [Bugfix][TPU] Fix pad slot id (#5977) 2024-06-28 18:55:17 -07:00
__init__.py [Misc] Add vLLM version getter to utils (#5098) 2024-06-13 11:21:39 -07:00
_custom_ops.py [Kernel] Adding bias epilogue support for cutlass_scaled_mm (#5560) 2024-06-26 15:16:00 +00:00
_ipex_ops.py [Kernel][CPU] Add Quick gelu to CPU (#5717) 2024-06-21 06:39:40 +00:00
block.py [core][misc] remove logical block (#5882) 2024-06-27 13:34:55 -07:00
config.py Support Deepseek-V2 (#4650) 2024-06-28 13:24:57 -07:00
envs.py [Hardware][Intel] OpenVINO vLLM backend (#5379) 2024-06-28 13:50:16 +00:00
logger.py [Misc] add logging level env var (#5045) 2024-05-24 23:49:49 -07:00
outputs.py [Core] Consolidate prompt arguments to LLM engines (#4328) 2024-05-28 13:29:31 -07:00
pooling_params.py [Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00
py.typed Add py.typed so consumers of vLLM can get type checking (#1509) 2023-10-30 14:50:47 -07:00
sampling_params.py [BugFix] Fix min_tokens behaviour for multiple eos tokens (#5849) 2024-06-27 11:31:11 -07:00
sequence.py [Core] Optimize SequenceStatus.is_finished by switching to IntEnum (#5974) 2024-06-29 12:47:53 +00:00
tracing.py [Misc] Add OpenTelemetry support (#4687) 2024-06-19 01:17:03 +09:00
utils.py [Hardware][Intel] OpenVINO vLLM backend (#5379) 2024-06-28 13:50:16 +00:00
version.py bump version to v0.5.0.post1 (#5522) 2024-06-13 19:42:06 -07:00