vllm/csrc at 602358f8a86ef9fc0ba882e083e19b44e00b9302 - vllm

History

Woosuk Kwon 602358f8a8 Add kernel for GeGLU with approximate GELU (#3337 )		2024-03-12 22:06:17 -07:00
..
attention	[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA (#3262 )	2024-03-10 15:27:45 -07:00
moe	Add fused top-K softmax kernel for MoE (#2769 )	2024-02-05 17:38:02 -08:00
punica	Enhance lora tests with more layer and rank variations (#3243 )	2024-03-09 17:14:16 -08:00
quantization	Integrate Marlin Kernels for Int4 GPTQ inference (#2497 )	2024-03-01 12:47:51 -08:00
activation_kernels.cu	Add kernel for GeGLU with approximate GELU (#3337 )	2024-03-12 22:06:17 -07:00
cache_kernels.cu	[Minor] Remove gather_cached_kv kernel (#3043 )	2024-02-26 15:00:54 -08:00
cache.h	[Minor] Remove gather_cached_kv kernel (#3043 )	2024-02-26 15:00:54 -08:00
cuda_compat.h	[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA (#3262 )	2024-03-10 15:27:45 -07:00
cuda_utils_kernels.cu	[ROCm] add support to ROCm 6.0 and MI300 (#2274 )	2024-01-26 12:41:10 -08:00
cuda_utils.h	[ROCm] add support to ROCm 6.0 and MI300 (#2274 )	2024-01-26 12:41:10 -08:00
custom_all_reduce_test.cu	Implement custom all reduce kernels (#2192 )	2024-01-27 12:46:35 -08:00
custom_all_reduce.cu	Implement custom all reduce kernels (#2192 )	2024-01-27 12:46:35 -08:00
custom_all_reduce.cuh	No repeated IPC open (#2642 )	2024-01-29 10:46:29 -08:00
dispatch_utils.h	DeepseekMoE support with Fused MoE kernel (#2453 )	2024-01-29 21:19:48 -08:00
layernorm_kernels.cu	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
moe_align_block_size_kernels.cu	Fused MOE for Mixtral (#2542 )	2024-01-29 22:43:37 -08:00
ops.h	Add kernel for GeGLU with approximate GELU (#3337 )	2024-03-12 22:06:17 -07:00
pos_encoding_kernels.cu	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
pybind.cpp	Add kernel for GeGLU with approximate GELU (#3337 )	2024-03-12 22:06:17 -07:00
reduction_utils.cuh	[ROCm] Fix warp and lane calculation in blockReduceSum (#3321 )	2024-03-11 13:14:07 -07:00