vllm/vllm/attention
2024-09-05 05:12:26 +00:00
..
backends [Core/Bugfix] Add query dtype as per FlashInfer API requirements. (#8173) 2024-09-05 05:12:26 +00:00
ops [Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208) 2024-08-12 22:47:41 +00:00
__init__.py [Core] Add AttentionState abstraction (#7663) 2024-08-20 18:50:45 +00:00
layer.py [Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942) 2024-08-06 16:51:47 -04:00
selector.py [Core][Kernels] Enable FP8 KV Cache with Flashinfer backend. + BugFix for kv_cache_dtype=auto (#7985) 2024-08-29 14:53:11 -04:00