vllm/csrc at fd5dcc5c816b7392821d3d4c02b13a7cf820d962 - vllm

History

Woosuk Kwon fd5dcc5c81 Optimize GeGLU layer in Gemma (#2975 )		2024-02-21 20:17:52 -08:00
..
attention	Fix compile error when using rocm (#2648 )	2024-02-01 09:35:09 -08:00
moe	Add fused top-K softmax kernel for MoE (#2769 )	2024-02-05 17:38:02 -08:00
punica	Speed up Punica compilation (#2632 )	2024-01-27 17:46:56 -08:00
quantization	Refactor 2 awq gemm kernels into m16nXk32 (#2723 )	2024-02-12 11:02:17 -08:00
activation_kernels.cu	Optimize GeGLU layer in Gemma (#2975 )	2024-02-21 20:17:52 -08:00
cache_kernels.cu	Fix compile error when using rocm (#2648 )	2024-02-01 09:35:09 -08:00
cache.h	Support FP8-E5M2 KV Cache (#2279 )	2024-01-28 16:43:54 -08:00
cuda_compat.h	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 )	2023-12-07 23:16:52 -08:00
cuda_utils_kernels.cu	[ROCm] add support to ROCm 6.0 and MI300 (#2274 )	2024-01-26 12:41:10 -08:00
cuda_utils.h	[ROCm] add support to ROCm 6.0 and MI300 (#2274 )	2024-01-26 12:41:10 -08:00
custom_all_reduce_test.cu	Implement custom all reduce kernels (#2192 )	2024-01-27 12:46:35 -08:00
custom_all_reduce.cu	Implement custom all reduce kernels (#2192 )	2024-01-27 12:46:35 -08:00
custom_all_reduce.cuh	No repeated IPC open (#2642 )	2024-01-29 10:46:29 -08:00
dispatch_utils.h	DeepseekMoE support with Fused MoE kernel (#2453 )	2024-01-29 21:19:48 -08:00
layernorm_kernels.cu	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
moe_align_block_size_kernels.cu	Fused MOE for Mixtral (#2542 )	2024-01-29 22:43:37 -08:00
ops.h	Optimize GeGLU layer in Gemma (#2975 )	2024-02-21 20:17:52 -08:00
pos_encoding_kernels.cu	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
pybind.cpp	Optimize GeGLU layer in Gemma (#2975 )	2024-02-21 20:17:52 -08:00
reduction_utils.cuh	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 )	2023-12-07 23:16:52 -08:00