vllm/attention at ce2702a92356b69ec1ea35ecd46263ddf98e8e2c - vllm

History

Elfie Guo e39ebf5cf5 [Core/Bugfix] Add query dtype as per FlashInfer API requirements. (#8173 )		2024-09-05 05:12:26 +00:00
..
backends	[Core/Bugfix] Add query dtype as per FlashInfer API requirements. (#8173 )	2024-09-05 05:12:26 +00:00
ops	[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208 )	2024-08-12 22:47:41 +00:00
__init__.py	[Core] Add `AttentionState` abstraction (#7663 )	2024-08-20 18:50:45 +00:00
layer.py	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 )	2024-08-06 16:51:47 -04:00
selector.py	[Core][Kernels] Enable FP8 KV Cache with Flashinfer backend. + BugFix for kv_cache_dtype=auto (#7985 )	2024-08-29 14:53:11 -04:00