| .. |
|
attention
|
Support FP8-E5M2 KV Cache (#2279)
|
2024-01-28 16:43:54 -08:00 |
|
punica
|
Speed up Punica compilation (#2632)
|
2024-01-27 17:46:56 -08:00 |
|
quantization
|
Support FP8-E5M2 KV Cache (#2279)
|
2024-01-28 16:43:54 -08:00 |
|
activation_kernels.cu
|
[FIX] Support non-zero CUDA devices in custom kernels (#1959)
|
2024-01-02 19:09:59 -08:00 |
|
cache_kernels.cu
|
Support FP8-E5M2 KV Cache (#2279)
|
2024-01-28 16:43:54 -08:00 |
|
cache.h
|
Support FP8-E5M2 KV Cache (#2279)
|
2024-01-28 16:43:54 -08:00 |
|
cuda_compat.h
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
|
2023-12-07 23:16:52 -08:00 |
|
cuda_utils_kernels.cu
|
[ROCm] add support to ROCm 6.0 and MI300 (#2274)
|
2024-01-26 12:41:10 -08:00 |
|
cuda_utils.h
|
[ROCm] add support to ROCm 6.0 and MI300 (#2274)
|
2024-01-26 12:41:10 -08:00 |
|
custom_all_reduce_test.cu
|
Implement custom all reduce kernels (#2192)
|
2024-01-27 12:46:35 -08:00 |
|
custom_all_reduce.cu
|
Implement custom all reduce kernels (#2192)
|
2024-01-27 12:46:35 -08:00 |
|
custom_all_reduce.cuh
|
No repeated IPC open (#2642)
|
2024-01-29 10:46:29 -08:00 |
|
dispatch_utils.h
|
DeepseekMoE support with Fused MoE kernel (#2453)
|
2024-01-29 21:19:48 -08:00 |
|
layernorm_kernels.cu
|
[FIX] Support non-zero CUDA devices in custom kernels (#1959)
|
2024-01-02 19:09:59 -08:00 |
|
moe_align_block_size_kernels.cu
|
Fused MOE for Mixtral (#2542)
|
2024-01-29 22:43:37 -08:00 |
|
ops.h
|
Fused MOE for Mixtral (#2542)
|
2024-01-29 22:43:37 -08:00 |
|
pos_encoding_kernels.cu
|
[FIX] Support non-zero CUDA devices in custom kernels (#1959)
|
2024-01-02 19:09:59 -08:00 |
|
pybind.cpp
|
Fused MOE for Mixtral (#2542)
|
2024-01-29 22:43:37 -08:00 |
|
reduction_utils.cuh
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
|
2023-12-07 23:16:52 -08:00 |