..
attention
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
2024-04-03 14:15:55 -07:00
cpu
[Bugfix] Add kv_scale input parameter to CPU backend ( #3840 )
2024-04-04 04:33:08 +00:00
moe
Add fused top-K softmax kernel for MoE ( #2769 )
2024-02-05 17:38:02 -08:00
punica
[Misc] Reduce supported Punica dtypes ( #4304 )
2024-04-23 18:54:33 -07:00
quantization
[Bugfix] Fix marlin kernel crash on H100 ( #4218 )
2024-04-24 10:35:01 -07:00
activation_kernels.cu
Add kernel for GeGLU with approximate GELU ( #3337 )
2024-03-12 22:06:17 -07:00
cache_kernels.cu
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
2024-04-03 14:15:55 -07:00
cache.h
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
2024-04-03 14:15:55 -07:00
cuda_compat.h
[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA ( #3262 )
2024-03-10 15:27:45 -07:00
cuda_utils_kernels.cu
[ROCm] add support to ROCm 6.0 and MI300 ( #2274 )
2024-01-26 12:41:10 -08:00
cuda_utils.h
[ROCm] add support to ROCm 6.0 and MI300 ( #2274 )
2024-01-26 12:41:10 -08:00
custom_all_reduce_test.cu
[BugFix] Some fixes for custom allreduce kernels ( #2760 )
2024-03-21 23:02:58 -07:00
custom_all_reduce.cu
[BugFix] Some fixes for custom allreduce kernels ( #2760 )
2024-03-21 23:02:58 -07:00
custom_all_reduce.cuh
[BugFix] Some fixes for custom allreduce kernels ( #2760 )
2024-03-21 23:02:58 -07:00
dispatch_utils.h
DeepseekMoE support with Fused MoE kernel ( #2453 )
2024-01-29 21:19:48 -08:00
layernorm_kernels.cu
[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations ( #3782 )
2024-04-08 14:31:02 -07:00
moe_align_block_size_kernels.cu
[Bugfix] Make moe_align_block_size AMD-compatible ( #3470 )
2024-03-18 11:26:24 -07:00
ops.h
[Kernel] FP8 support for MoE kernel / Mixtral ( #4244 )
2024-04-24 01:18:23 +00:00
pos_encoding_kernels.cu
Add batched RoPE kernel ( #3095 )
2024-03-13 13:45:26 -07:00
pybind.cpp
[Kernel] FP8 support for MoE kernel / Mixtral ( #4244 )
2024-04-24 01:18:23 +00:00
reduction_utils.cuh
[Kernel] Layernorm performance optimization ( #3662 )
2024-03-30 14:26:38 -07:00