vllm/vllm/attention/ops
Angus Wang c2170a5b39
[Kernel] Explicitly specify other value in tl.load calls (#9014)
Signed-off-by: Angus Wang <wangjadehao@gmail.com>
2024-11-18 11:39:40 -08:00
..
blocksparse_attention [Kernel] Explicitly specify other value in tl.load calls (#9014) 2024-11-18 11:39:40 -08:00
__init__.py [Core] Refactor Attention Take 2 (#3462) 2024-03-25 04:39:33 +00:00
hpu_paged_attn.py [Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143) 2024-11-06 01:09:10 -08:00
ipex_attn.py Support Roberta embedding models (#9387) 2024-11-14 21:23:29 +00:00
paged_attn.py Support Roberta embedding models (#9387) 2024-11-14 21:23:29 +00:00
prefix_prefill.py [CI/Build] Avoid CUDA initialization (#8534) 2024-09-18 10:38:11 +00:00
triton_flash_attention.py [ROCm][AMD][Bugfix] adding a missing triton autotune config (#4845) 2024-05-16 10:46:52 -07:00