vllm/backends at 51a08e7d8f0f11411d380c007ab606fc2d5e3cf9 - vllm

History

Antoni Baum 0ab278ca31 [Core] Remove unnecessary copies in flash attn backend (#5138 )		2024-06-03 09:39:31 -07:00
..
__init__.py	[Core] Refactor Attention Take 2 (#3462 )	2024-03-25 04:39:33 +00:00
abstract.py	[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )	2024-05-24 22:00:52 -07:00
blocksparse_attn.py	[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )	2024-05-24 22:00:52 -07:00
flash_attn.py	[Core] Remove unnecessary copies in flash attn backend (#5138 )	2024-06-03 09:39:31 -07:00
flashinfer.py	[Misc] Take user preference in attention selector (#4960 )	2024-05-23 07:55:56 +09:00
rocm_flash_attn.py	[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )	2024-05-24 22:00:52 -07:00
torch_sdpa.py	[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )	2024-05-24 22:00:52 -07:00
xformers.py	[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )	2024-05-24 22:00:52 -07:00