vllm/vllm at f7dac83d95ae38973b425a8bb2d3a3df9fe9a9c2 - vllm

History

Cody Yu f7dac83d95 [Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k (#5939 )		2024-06-29 21:04:20 +08:00
..
attention	[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (#4628 )	2024-06-28 15:28:49 -07:00
core	[core][misc] remove logical block (#5882 )	2024-06-27 13:34:55 -07:00
distributed	[Distributed] Make it clear that % should not be in tensor dict keys. (#5927 )	2024-06-28 15:20:22 +00:00
engine	[Bugfix] Support `eos_token_id` from `config.json` (#5954 )	2024-06-29 11:19:02 +00:00
entrypoints	[Bugfix] Fix Engine Failing After Invalid Request - AsyncEngineDeadError (#5963 )	2024-06-28 17:46:30 -04:00
executor	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
inputs	[Core] Registry for processing model inputs (#5214 )	2024-06-28 12:09:56 +00:00
logging	[MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273 )	2024-05-01 17:34:40 -07:00
lora	[Model] Add Gemma 2 (#5908 )	2024-06-27 13:33:56 -07:00
model_executor	[Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k (#5939 )	2024-06-29 21:04:20 +08:00
multimodal	[Core] Registry for processing model inputs (#5214 )	2024-06-28 12:09:56 +00:00
spec_decode	[Spec Decode] Introduce DraftModelRunner (#5799 )	2024-06-28 09:17:51 -07:00
transformers_utils	[Bugfix] Support `eos_token_id` from `config.json` (#5954 )	2024-06-29 11:19:02 +00:00
usage	[Misc] Add vLLM version getter to utils (#5098 )	2024-06-13 11:21:39 -07:00
worker	[Bugfix][TPU] Fix pad slot id (#5977 )	2024-06-28 18:55:17 -07:00
__init__.py	[Misc] Add vLLM version getter to utils (#5098 )	2024-06-13 11:21:39 -07:00
_custom_ops.py	[Kernel] Adding bias epilogue support for `cutlass_scaled_mm` (#5560 )	2024-06-26 15:16:00 +00:00
_ipex_ops.py	[Kernel][CPU] Add Quick `gelu` to CPU (#5717 )	2024-06-21 06:39:40 +00:00
block.py	[core][misc] remove logical block (#5882 )	2024-06-27 13:34:55 -07:00
config.py	Support Deepseek-V2 (#4650 )	2024-06-28 13:24:57 -07:00
envs.py	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
logger.py	[Misc] add logging level env var (#5045 )	2024-05-24 23:49:49 -07:00
outputs.py	[Core] Consolidate prompt arguments to LLM engines (#4328 )	2024-05-28 13:29:31 -07:00
pooling_params.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
py.typed	Add py.typed so consumers of vLLM can get type checking (#1509 )	2023-10-30 14:50:47 -07:00
sampling_params.py	[BugFix] Fix `min_tokens` behaviour for multiple eos tokens (#5849 )	2024-06-27 11:31:11 -07:00
sequence.py	[Core] Optimize `SequenceStatus.is_finished` by switching to IntEnum (#5974 )	2024-06-29 12:47:53 +00:00
tracing.py	[Misc] Add OpenTelemetry support (#4687 )	2024-06-19 01:17:03 +09:00
utils.py	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
version.py	bump version to v0.5.0.post1 (#5522 )	2024-06-13 19:42:06 -07:00