vllm/vllm/attention
youkaichao 4aba6e3d1a
[core] gemma2 full context length support (#10584)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-22 20:13:54 -08:00
..
backends [torch.compile] support all attention backends (#10558) 2024-11-22 14:04:42 -08:00
ops [Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355) 2024-11-20 10:57:39 +00:00
__init__.py [Core] Add AttentionState abstraction (#7663) 2024-08-20 18:50:45 +00:00
layer.py [core] gemma2 full context length support (#10584) 2024-11-22 20:13:54 -08:00
selector.py [Platform][Refactor] Extract func get_default_attn_backend to Platform (#10358) 2024-11-19 11:22:26 +08:00