vllm/vllm at 925f3332cac488e5ad2dbc8f5c6d5f42d2556816 - vllm

History

Woosuk Kwon 925f3332ca [Core] Refactor Attention Take 2 (#3462 )		2024-03-25 04:39:33 +00:00
..
attention	[Core] Refactor Attention Take 2 (#3462 )	2024-03-25 04:39:33 +00:00
core	Dynamic scheduler delay to improve ITL performance (#3279 )	2024-03-22 12:28:14 -07:00
engine	[Core] Improve detokenization performance for prefill (#3469 )	2024-03-22 13:44:12 -07:00
entrypoints	[BugFix] Some fixes for custom allreduce kernels (#2760 )	2024-03-21 23:02:58 -07:00
executor	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00
lora	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00
model_executor	[Core] Refactor Attention Take 2 (#3462 )	2024-03-25 04:39:33 +00:00
spec_decode	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00
transformers_utils	[Core] Improve detokenization performance for prefill (#3469 )	2024-03-22 13:44:12 -07:00
worker	[Core] Refactor Attention Take 2 (#3462 )	2024-03-25 04:39:33 +00:00
__init__.py	Add distributed model executor abstraction (#3191 )	2024-03-11 11:03:45 -07:00
block.py	Add Automatic Prefix Caching (#2762 )	2024-03-02 00:50:01 -08:00
config.py	Dynamic scheduler delay to improve ITL performance (#3279 )	2024-03-22 12:28:14 -07:00
logger.py	Make vLLM logging formatting optional (#2877 )	2024-02-20 14:38:55 -08:00
outputs.py	[Fix] Fix best_of behavior when n=1 (#3298 )	2024-03-10 19:17:46 -07:00
py.typed	Add py.typed so consumers of vLLM can get type checking (#1509 )	2023-10-30 14:50:47 -07:00
sampling_params.py	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
sequence.py	[Core] Refactor Attention Take 2 (#3462 )	2024-03-25 04:39:33 +00:00
test_utils.py	Use CuPy for CUDA graphs (#2811 )	2024-02-13 11:32:06 -08:00
utils.py	[Hardware][Neuron] Refactor neuron support (#3471 )	2024-03-22 01:22:17 +00:00