vllm/vllm/attention/backends
Lily Liu 69ec3ca14c
[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-07-04 16:35:51 -07:00
..
__init__.py [Core] Refactor Attention Take 2 (#3462) 2024-03-25 04:39:33 +00:00
abstract.py [Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408) 2024-06-25 20:30:03 -07:00
blocksparse_attn.py [Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408) 2024-06-25 20:30:03 -07:00
flash_attn.py [Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408) 2024-06-25 20:30:03 -07:00
flashinfer.py [Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051) 2024-07-04 16:35:51 -07:00
ipex_attn.py [Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408) 2024-06-25 20:30:03 -07:00
openvino.py [Hardware][Intel] OpenVINO vLLM backend (#5379) 2024-06-28 13:50:16 +00:00
pallas.py [Hardware][TPU] Optimize KV cache swapping (#5878) 2024-06-27 21:12:13 -07:00
rocm_flash_attn.py [ROCm][AMD][Model]Adding alibi slopes support in ROCm triton flash attention and naive flash attention (#6043) 2024-07-03 22:19:38 -07:00
torch_sdpa.py [Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408) 2024-06-25 20:30:03 -07:00
xformers.py [Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408) 2024-06-25 20:30:03 -07:00