vllm/vllm/attention
2024-07-24 22:33:56 -07:00
..
backends [Bugfix] Fix decode tokens w. CUDA graph (#6757) 2024-07-24 22:33:56 -07:00
ops [Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081) 2024-07-16 15:31:32 -07:00
__init__.py [Core] Refactor _prepare_model_input_tensors - take 2 (#6164) 2024-07-17 09:37:16 -07:00
layer.py [Misc] Support FP8 kv cache scales from compressed-tensors (#6528) 2024-07-23 04:11:50 +00:00
selector.py [Core] Refactor _prepare_model_input_tensors - take 2 (#6164) 2024-07-17 09:37:16 -07:00