vllm/csrc
bigPYJ1151 0e3f06fe9c
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
2024-04-01 22:07:30 -07:00
..
attention [ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA (#3262) 2024-03-10 15:27:45 -07:00
cpu [Hardware][Intel] Add CPU inference backend (#3634) 2024-04-01 22:07:30 -07:00
moe Add fused top-K softmax kernel for MoE (#2769) 2024-02-05 17:38:02 -08:00
punica [Kernel] support non-zero cuda devices in punica kernels (#3636) 2024-03-27 00:37:42 +00:00
quantization Integrate Marlin Kernels for Int4 GPTQ inference (#2497) 2024-03-01 12:47:51 -08:00
activation_kernels.cu Add kernel for GeGLU with approximate GELU (#3337) 2024-03-12 22:06:17 -07:00
cache_kernels.cu [Minor] Remove gather_cached_kv kernel (#3043) 2024-02-26 15:00:54 -08:00
cache.h [Minor] Remove gather_cached_kv kernel (#3043) 2024-02-26 15:00:54 -08:00
cuda_compat.h [ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA (#3262) 2024-03-10 15:27:45 -07:00
cuda_utils_kernels.cu [ROCm] add support to ROCm 6.0 and MI300 (#2274) 2024-01-26 12:41:10 -08:00
cuda_utils.h [ROCm] add support to ROCm 6.0 and MI300 (#2274) 2024-01-26 12:41:10 -08:00
custom_all_reduce_test.cu [BugFix] Some fixes for custom allreduce kernels (#2760) 2024-03-21 23:02:58 -07:00
custom_all_reduce.cu [BugFix] Some fixes for custom allreduce kernels (#2760) 2024-03-21 23:02:58 -07:00
custom_all_reduce.cuh [BugFix] Some fixes for custom allreduce kernels (#2760) 2024-03-21 23:02:58 -07:00
dispatch_utils.h DeepseekMoE support with Fused MoE kernel (#2453) 2024-01-29 21:19:48 -08:00
layernorm_kernels.cu [Kernel] Layernorm performance optimization (#3662) 2024-03-30 14:26:38 -07:00
moe_align_block_size_kernels.cu [Bugfix] Make moe_align_block_size AMD-compatible (#3470) 2024-03-18 11:26:24 -07:00
ops.h Add batched RoPE kernel (#3095) 2024-03-13 13:45:26 -07:00
pos_encoding_kernels.cu Add batched RoPE kernel (#3095) 2024-03-13 13:45:26 -07:00
pybind.cpp Add batched RoPE kernel (#3095) 2024-03-13 13:45:26 -07:00
reduction_utils.cuh [Kernel] Layernorm performance optimization (#3662) 2024-03-30 14:26:38 -07:00