vllm/vllm/attention
2024-06-05 09:18:59 -07:00
..
backends [Core] Remove unnecessary copies in flash attn backend (#5138) 2024-06-03 09:39:31 -07:00
ops [Model] Support MAP-NEO model (#5081) 2024-05-30 19:24:41 -07:00
__init__.py [Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681) 2024-05-15 14:00:10 +09:00
layer.py [Bugfix / Core] Prefix Caching Guards (merged with main) (#4846) 2024-05-27 15:18:17 -07:00
selector.py [Misc] Fix docstring of get_attn_backend (#5271) 2024-06-05 09:18:59 -07:00