| .. |
|
attention
|
[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support (#4535)
|
2024-05-09 18:04:17 -06:00 |
|
cpu
|
[Misc] Apply a couple g++ cleanups (#4719)
|
2024-05-10 13:37:05 +00:00 |
|
moe
|
Add fused top-K softmax kernel for MoE (#2769)
|
2024-02-05 17:38:02 -08:00 |
|
punica
|
[ROCm] Add support for Punica kernels on AMD GPUs (#3140)
|
2024-05-09 09:19:50 -07:00 |
|
quantization
|
[Kernel] add bfloat16 support for gptq marlin kernel (#4788)
|
2024-05-16 09:55:29 -04:00 |
|
activation_kernels.cu
|
Add kernel for GeGLU with approximate GELU (#3337)
|
2024-03-12 22:06:17 -07:00 |
|
cache_kernels.cu
|
[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support (#4535)
|
2024-05-09 18:04:17 -06:00 |
|
cache.h
|
[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support (#4535)
|
2024-05-09 18:04:17 -06:00 |
|
cuda_compat.h
|
[ROCm] Add support for Punica kernels on AMD GPUs (#3140)
|
2024-05-09 09:19:50 -07:00 |
|
cuda_utils_kernels.cu
|
[ROCm] add support to ROCm 6.0 and MI300 (#2274)
|
2024-01-26 12:41:10 -08:00 |
|
cuda_utils.h
|
[ROCm] add support to ROCm 6.0 and MI300 (#2274)
|
2024-01-26 12:41:10 -08:00 |
|
custom_all_reduce_test.cu
|
[BugFix] Some fixes for custom allreduce kernels (#2760)
|
2024-03-21 23:02:58 -07:00 |
|
custom_all_reduce.cu
|
[BugFix] Some fixes for custom allreduce kernels (#2760)
|
2024-03-21 23:02:58 -07:00 |
|
custom_all_reduce.cuh
|
[BugFix] Some fixes for custom allreduce kernels (#2760)
|
2024-03-21 23:02:58 -07:00 |
|
dispatch_utils.h
|
DeepseekMoE support with Fused MoE kernel (#2453)
|
2024-01-29 21:19:48 -08:00 |
|
layernorm_kernels.cu
|
[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations (#3782)
|
2024-04-08 14:31:02 -07:00 |
|
moe_align_block_size_kernels.cu
|
[Bugfix] Make moe_align_block_size AMD-compatible (#3470)
|
2024-03-18 11:26:24 -07:00 |
|
ops.h
|
[Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518)
|
2024-05-03 10:20:12 -07:00 |
|
pos_encoding_kernels.cu
|
Add batched RoPE kernel (#3095)
|
2024-03-13 13:45:26 -07:00 |
|
pybind.cpp
|
[Kernel] Use flashinfer for decoding (#4353)
|
2024-05-03 15:51:27 -07:00 |
|
reduction_utils.cuh
|
[Kernel] Layernorm performance optimization (#3662)
|
2024-03-30 14:26:38 -07:00 |