vllm/vllm/attention
Michał Moskal d4f3985907
[Core] Sliding window for block manager v2 (#4545)
Co-authored-by: Ruth Evans <ruthevans@Ruths-MacBook-Pro.local>
2024-05-28 11:07:07 +09:00
..
backends [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799) 2024-05-24 22:00:52 -07:00
ops [Core] Sliding window for block manager v2 (#4545) 2024-05-28 11:07:07 +09:00
__init__.py [Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681) 2024-05-15 14:00:10 +09:00
layer.py [Bugfix / Core] Prefix Caching Guards (merged with main) (#4846) 2024-05-27 15:18:17 -07:00
selector.py [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799) 2024-05-24 22:00:52 -07:00