vllm/csrc/cpu
Eric Xihui Lin 8e192ff967
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799)
Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-05-24 22:00:52 -07:00
..
activation.cpp [CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
attention.cpp [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799) 2024-05-24 22:00:52 -07:00
cache.cpp [CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
cpu_types.hpp [Hardware][Intel] Add CPU inference backend (#3634) 2024-04-01 22:07:30 -07:00
layernorm.cpp [CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
pos_encoding.cpp [CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
pybind.cpp [CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00