vllm/attention at ef99a78760896316dd05f96683b8d8176bfacd7a - vllm

History

youkaichao ef99a78760 Revert "[Core][Kernels] Use FlashInfer backend for FP8 KV Cache when available." (#7982 )		2024-08-28 21:27:06 -07:00
..
backends	Revert "[Core][Kernels] Use FlashInfer backend for FP8 KV Cache when available." (#7982 )	2024-08-28 21:27:06 -07:00
ops	[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208 )	2024-08-12 22:47:41 +00:00
__init__.py	[Core] Add `AttentionState` abstraction (#7663 )	2024-08-20 18:50:45 +00:00
layer.py	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 )	2024-08-06 16:51:47 -04:00
selector.py	Revert "[Core][Kernels] Use FlashInfer backend for FP8 KV Cache when available." (#7982 )	2024-08-28 21:27:06 -07:00