vllm/ops at 343041c4c4db93b4693ba437df7ae8bea485d18e - vllm

History

Angus Wang c2170a5b39 [Kernel] Explicitly specify other value in tl.load calls (#9014 ) Signed-off-by: Angus Wang <wangjadehao@gmail.com>		2024-11-18 11:39:40 -08:00
..
blocksparse_attention	[Kernel] Explicitly specify other value in tl.load calls (#9014 )	2024-11-18 11:39:40 -08:00
__init__.py	[Core] Refactor Attention Take 2 (#3462 )	2024-03-25 04:39:33 +00:00
hpu_paged_attn.py	[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143 )	2024-11-06 01:09:10 -08:00
ipex_attn.py	Support Roberta embedding models (#9387 )	2024-11-14 21:23:29 +00:00
paged_attn.py	Support Roberta embedding models (#9387 )	2024-11-14 21:23:29 +00:00
prefix_prefill.py	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
triton_flash_attention.py	[ROCm][AMD][Bugfix] adding a missing triton autotune config (#4845 )	2024-05-16 10:46:52 -07:00