vllm/vllm/attention
2024-08-28 21:27:06 -07:00
..
backends Revert "[Core][Kernels] Use FlashInfer backend for FP8 KV Cache when available." (#7982) 2024-08-28 21:27:06 -07:00
ops [Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208) 2024-08-12 22:47:41 +00:00
__init__.py [Core] Add AttentionState abstraction (#7663) 2024-08-20 18:50:45 +00:00
layer.py [Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942) 2024-08-06 16:51:47 -04:00
selector.py Revert "[Core][Kernels] Use FlashInfer backend for FP8 KV Cache when available." (#7982) 2024-08-28 21:27:06 -07:00