vllm/attention at 50704f52c4643777fb0e5dc99f6c048dd9f54f2d - vllm

History

Cody Yu 309aaef825 [Bugfix] Fix decode tokens w. CUDA graph (#6757 )		2024-07-24 22:33:56 -07:00
..
backends	[Bugfix] Fix decode tokens w. CUDA graph (#6757 )	2024-07-24 22:33:56 -07:00
ops	[Kernel][Attention] Separate `Attention.kv_scale` into `k_scale` and `v_scale` (#6081 )	2024-07-16 15:31:32 -07:00
__init__.py	[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )	2024-07-17 09:37:16 -07:00
layer.py	[Misc] Support FP8 kv cache scales from compressed-tensors (#6528 )	2024-07-23 04:11:50 +00:00
selector.py	[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )	2024-07-17 09:37:16 -07:00