vllm/csrc
2024-01-27 12:46:35 -08:00
..
attention [FIX] Support non-zero CUDA devices in custom kernels (#1959) 2024-01-02 19:09:59 -08:00
punica [Experimental] Add multi-LoRA support (#1804) 2024-01-23 15:26:37 -08:00
quantization AWQ: Up to 2.66x higher throughput (#2566) 2024-01-26 23:53:17 -08:00
activation_kernels.cu [FIX] Support non-zero CUDA devices in custom kernels (#1959) 2024-01-02 19:09:59 -08:00
cache_kernels.cu use a correct device when creating OptionalCUDAGuard (#2583) 2024-01-25 23:48:17 -08:00
cache.h Avoid multiple redefinition (#1817) 2023-12-14 09:35:58 -08:00
cuda_compat.h Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836) 2023-12-07 23:16:52 -08:00
cuda_utils_kernels.cu [ROCm] add support to ROCm 6.0 and MI300 (#2274) 2024-01-26 12:41:10 -08:00
cuda_utils.h [ROCm] add support to ROCm 6.0 and MI300 (#2274) 2024-01-26 12:41:10 -08:00
custom_all_reduce_test.cu Implement custom all reduce kernels (#2192) 2024-01-27 12:46:35 -08:00
custom_all_reduce.cu Implement custom all reduce kernels (#2192) 2024-01-27 12:46:35 -08:00
custom_all_reduce.cuh Implement custom all reduce kernels (#2192) 2024-01-27 12:46:35 -08:00
dispatch_utils.h Avoid multiple redefinition (#1817) 2023-12-14 09:35:58 -08:00
layernorm_kernels.cu [FIX] Support non-zero CUDA devices in custom kernels (#1959) 2024-01-02 19:09:59 -08:00
ops.h Implement custom all reduce kernels (#2192) 2024-01-27 12:46:35 -08:00
pos_encoding_kernels.cu [FIX] Support non-zero CUDA devices in custom kernels (#1959) 2024-01-02 19:09:59 -08:00
pybind.cpp Implement custom all reduce kernels (#2192) 2024-01-27 12:46:35 -08:00
reduction_utils.cuh Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836) 2023-12-07 23:16:52 -08:00