vllm/attention at 87d41c849d2cde9279fb08a3a0d97123e3d8fe2f - vllm

History

Eric Xihui Lin 8e192ff967 [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 ) Co-authored-by: beagleski <yunanzhang@microsoft.com> Co-authored-by: bapatra <bapatra@microsoft.com> Co-authored-by: Barun Patra <codedecde@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>		2024-05-24 22:00:52 -07:00
..
attention_dtypes.h	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )	2024-04-03 14:15:55 -07:00
attention_generic.cuh	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00
attention_kernels.cu	[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )	2024-05-24 22:00:52 -07:00
attention_utils.cuh	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00
dtype_bfloat16.cuh	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00
dtype_float16.cuh	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00
dtype_float32.cuh	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00
dtype_fp8.cuh	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00