vllm/vllm at dfea17314827845d55dabb03ebe905f58e6682e4 - vllm

History

Ruoyu Qin dfea173148 [Bugfix] Abort requests when the connection to /v1/completions is interrupted (#4363 )		2024-04-27 09:48:37 -07:00
..
attention	[ROCm][Hardware][AMD] Enable group query attention for triton FA (#4406 )	2024-04-26 23:37:40 -07:00
core	[Model] Phi-3 4k sliding window temp. fix (#4380 )	2024-04-27 18:08:15 +08:00
distributed	[CI] Disable non-lazy string operation on logging (#4326 )	2024-04-26 00:16:58 -07:00
engine	[Bugfix][Core] Fix get decoding config from ray (#4335 )	2024-04-27 11:30:08 +00:00
entrypoints	[Bugfix][Core] Fix get decoding config from ray (#4335 )	2024-04-27 11:30:08 +00:00
executor	[Core] Introduce `DistributedGPUExecutor` abstract class (#4348 )	2024-04-27 04:14:26 +00:00
lora	[Kernel] Full Tensor Parallelism for LoRA Layers (#3524 )	2024-04-27 00:03:48 -07:00
model_executor	[Kernel] Optimize FP8 support for MoE kernel / Mixtral via static scales (#4343 )	2024-04-27 04:49:59 +00:00
spec_decode	[CI] Disable non-lazy string operation on logging (#4326 )	2024-04-26 00:16:58 -07:00
transformers_utils	[CI] Disable non-lazy string operation on logging (#4326 )	2024-04-26 00:16:58 -07:00
usage	[mypy] Add mypy type annotation part 1 (#4006 )	2024-04-12 14:35:50 -07:00
worker	[Core] Refactoring sampler and support prompt logprob for chunked prefill (#4309 )	2024-04-26 13:02:02 +00:00
__init__.py	[Core] Move ray_utils.py from `engine` to `executor` package (#4347 )	2024-04-25 06:52:22 +00:00
_custom_ops.py	[Kernel] Optimize FP8 support for MoE kernel / Mixtral via static scales (#4343 )	2024-04-27 04:49:59 +00:00
block.py	Add Automatic Prefix Caching (#2762 )	2024-03-02 00:50:01 -08:00
config.py	[Kernel] Full Tensor Parallelism for LoRA Layers (#3524 )	2024-04-27 00:03:48 -07:00
logger.py	[CI] Disable non-lazy string operation on logging (#4326 )	2024-04-26 00:16:58 -07:00
outputs.py	[BugFix] Fix handling of stop strings and stop token ids (#3672 )	2024-04-11 15:34:12 -07:00
py.typed	Add py.typed so consumers of vLLM can get type checking (#1509 )	2023-10-30 14:50:47 -07:00
sampling_params.py	Support eos_token_id from generation_config.json (#4182 )	2024-04-19 04:13:36 +00:00
sequence.py	[Core] Refactoring sampler and support prompt logprob for chunked prefill (#4309 )	2024-04-26 13:02:02 +00:00
test_utils.py	[Core][Refactor] move parallel_utils into vllm/distributed (#3950 )	2024-04-10 15:33:30 -07:00
utils.py	[Bugfix] Abort requests when the connection to /v1/completions is interrupted (#4363 )	2024-04-27 09:48:37 -07:00