| .. |
|
attention
|
Replace head_mapping params with num_kv_heads to attention kernel. (#1997)
|
2023-12-10 10:12:53 -08:00 |
|
quantization
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
|
2023-12-07 23:16:52 -08:00 |
|
activation_kernels.cu
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
|
2023-12-07 23:16:52 -08:00 |
|
cache_kernels.cu
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
|
2023-12-07 23:16:52 -08:00 |
|
cache.h
|
[Build] Avoid building too many extensions (#1624)
|
2023-11-23 16:31:19 -08:00 |
|
cuda_compat.h
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
|
2023-12-07 23:16:52 -08:00 |
|
cuda_utils_kernels.cu
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
|
2023-12-07 23:16:52 -08:00 |
|
cuda_utils.h
|
[Build] Avoid building too many extensions (#1624)
|
2023-11-23 16:31:19 -08:00 |
|
dispatch_utils.h
|
Avoid compiling kernels for double data type (#933)
|
2023-09-02 14:59:47 +09:00 |
|
layernorm_kernels.cu
|
[Optimization] Implement fused add rmsnorm (#1667)
|
2023-11-18 18:18:02 -08:00 |
|
ops.h
|
Replace head_mapping params with num_kv_heads to attention kernel. (#1997)
|
2023-12-10 10:12:53 -08:00 |
|
pos_encoding_kernels.cu
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
|
2023-12-07 23:16:52 -08:00 |
|
pybind.cpp
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
|
2023-12-07 23:16:52 -08:00 |
|
reduction_utils.cuh
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
|
2023-12-07 23:16:52 -08:00 |