vllm/csrc at 380170038e05cf81953c29d7e8ed789e048b6434 - vllm

History

Hanzhi Zhou 380170038e Implement custom all reduce kernels (#2192 )		2024-01-27 12:46:35 -08:00
..
attention	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
punica	[Experimental] Add multi-LoRA support (#1804 )	2024-01-23 15:26:37 -08:00
quantization	AWQ: Up to 2.66x higher throughput (#2566 )	2024-01-26 23:53:17 -08:00
activation_kernels.cu	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
cache_kernels.cu	use a correct device when creating OptionalCUDAGuard (#2583 )	2024-01-25 23:48:17 -08:00
cache.h	Avoid multiple redefinition (#1817 )	2023-12-14 09:35:58 -08:00
cuda_compat.h	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 )	2023-12-07 23:16:52 -08:00
cuda_utils_kernels.cu	[ROCm] add support to ROCm 6.0 and MI300 (#2274 )	2024-01-26 12:41:10 -08:00
cuda_utils.h	[ROCm] add support to ROCm 6.0 and MI300 (#2274 )	2024-01-26 12:41:10 -08:00
custom_all_reduce_test.cu	Implement custom all reduce kernels (#2192 )	2024-01-27 12:46:35 -08:00
custom_all_reduce.cu	Implement custom all reduce kernels (#2192 )	2024-01-27 12:46:35 -08:00
custom_all_reduce.cuh	Implement custom all reduce kernels (#2192 )	2024-01-27 12:46:35 -08:00
dispatch_utils.h	Avoid multiple redefinition (#1817 )	2023-12-14 09:35:58 -08:00
layernorm_kernels.cu	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
ops.h	Implement custom all reduce kernels (#2192 )	2024-01-27 12:46:35 -08:00
pos_encoding_kernels.cu	[FIX] Support non-zero CUDA devices in custom kernels (#1959 )	2024-01-02 19:09:59 -08:00
pybind.cpp	Implement custom all reduce kernels (#2192 )	2024-01-27 12:46:35 -08:00
reduction_utils.cuh	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 )	2023-12-07 23:16:52 -08:00