vllm/vllm/attention/backends
2024-06-03 09:39:31 -07:00
..
__init__.py [Core] Refactor Attention Take 2 (#3462) 2024-03-25 04:39:33 +00:00
abstract.py [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799) 2024-05-24 22:00:52 -07:00
blocksparse_attn.py [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799) 2024-05-24 22:00:52 -07:00
flash_attn.py [Core] Remove unnecessary copies in flash attn backend (#5138) 2024-06-03 09:39:31 -07:00
flashinfer.py [Misc] Take user preference in attention selector (#4960) 2024-05-23 07:55:56 +09:00
rocm_flash_attn.py [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799) 2024-05-24 22:00:52 -07:00
torch_sdpa.py [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799) 2024-05-24 22:00:52 -07:00
xformers.py [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799) 2024-05-24 22:00:52 -07:00