vllm/backends at bc96d5c330e079fa501eee05e97bf15009c9a094 - vllm

History

Lily Liu 69ec3ca14c [Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051 ) Co-authored-by: Simon Mo <simon.mo@hey.com>		2024-07-04 16:35:51 -07:00
..
__init__.py	[Core] Refactor Attention Take 2 (#3462 )	2024-03-25 04:39:33 +00:00
abstract.py	[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408 )	2024-06-25 20:30:03 -07:00
blocksparse_attn.py	[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408 )	2024-06-25 20:30:03 -07:00
flash_attn.py	[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408 )	2024-06-25 20:30:03 -07:00
flashinfer.py	[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051 )	2024-07-04 16:35:51 -07:00
ipex_attn.py	[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408 )	2024-06-25 20:30:03 -07:00
openvino.py	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
pallas.py	[Hardware][TPU] Optimize KV cache swapping (#5878 )	2024-06-27 21:12:13 -07:00
rocm_flash_attn.py	[ROCm][AMD][Model]Adding alibi slopes support in ROCm triton flash attention and naive flash attention (#6043 )	2024-07-03 22:19:38 -07:00
torch_sdpa.py	[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408 )	2024-06-25 20:30:03 -07:00
xformers.py	[Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408 )	2024-06-25 20:30:03 -07:00