vllm/csrc/cpu
bnellnm 73202dbe77
[Kernel][Misc] register ops to prevent graph breaks (#6917)
Co-authored-by: Sage Moore <sage@neuralmagic.com>
2024-09-11 12:52:19 -07:00
..
activation.cpp [Kernel][CPU] Add Quick gelu to CPU (#5717) 2024-06-21 06:39:40 +00:00
attention.cpp [Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081) 2024-07-16 15:31:32 -07:00
cache.cpp [Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081) 2024-07-16 15:31:32 -07:00
cpu_types_vsx.hpp Support CPU inference with VSX PowerPC ISA (#5652) 2024-06-26 21:53:04 +00:00
cpu_types_x86.hpp [Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257) 2024-09-11 09:46:46 -07:00
cpu_types.hpp Support CPU inference with VSX PowerPC ISA (#5652) 2024-06-26 21:53:04 +00:00
dnnl_helper.hpp [Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257) 2024-09-11 09:46:46 -07:00
layernorm.cpp [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00
pos_encoding.cpp [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00
quant.cpp [Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257) 2024-09-11 09:46:46 -07:00
torch_bindings.cpp [Kernel][Misc] register ops to prevent graph breaks (#6917) 2024-09-11 12:52:19 -07:00
utils.cpp [Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257) 2024-09-11 09:46:46 -07:00