vllm/csrc at e288df0632d5bdde76c20bed8310b46d35b8e5ac - vllm

History

alexm-nm e288df0632 [Bugfix] Fine-tune gptq_marlin configs to be more similar to marlin (#4626 )		2024-05-08 17:14:31 -07:00
..
attention	[Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518 )	2024-05-03 10:20:12 -07:00
cpu	[Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659 )	2024-05-08 12:07:05 -07:00
moe	Add fused top-K softmax kernel for MoE (#2769 )	2024-02-05 17:38:02 -08:00
punica	[Kernel] Full Tensor Parallelism for LoRA Layers (#3524 )	2024-04-27 00:03:48 -07:00
quantization	[Bugfix] Fine-tune gptq_marlin configs to be more similar to marlin (#4626 )	2024-05-08 17:14:31 -07:00
activation_kernels.cu	Add kernel for GeGLU with approximate GELU (#3337 )	2024-03-12 22:06:17 -07:00
cache_kernels.cu	[Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659 )	2024-05-08 12:07:05 -07:00
cache.h	[Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659 )	2024-05-08 12:07:05 -07:00
cuda_compat.h	[ROCM] Fix blockReduceSum to use correct warp counts for ROCm and CUDA (#3262 )	2024-03-10 15:27:45 -07:00
cuda_utils_kernels.cu	[ROCm] add support to ROCm 6.0 and MI300 (#2274 )	2024-01-26 12:41:10 -08:00
cuda_utils.h	[ROCm] add support to ROCm 6.0 and MI300 (#2274 )	2024-01-26 12:41:10 -08:00
custom_all_reduce_test.cu	[BugFix] Some fixes for custom allreduce kernels (#2760 )	2024-03-21 23:02:58 -07:00
custom_all_reduce.cu	[BugFix] Some fixes for custom allreduce kernels (#2760 )	2024-03-21 23:02:58 -07:00
custom_all_reduce.cuh	[BugFix] Some fixes for custom allreduce kernels (#2760 )	2024-03-21 23:02:58 -07:00
dispatch_utils.h	DeepseekMoE support with Fused MoE kernel (#2453 )	2024-01-29 21:19:48 -08:00
layernorm_kernels.cu	[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations (#3782 )	2024-04-08 14:31:02 -07:00
moe_align_block_size_kernels.cu	[Bugfix] Make moe_align_block_size AMD-compatible (#3470 )	2024-03-18 11:26:24 -07:00
ops.h	[Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518 )	2024-05-03 10:20:12 -07:00
pos_encoding_kernels.cu	Add batched RoPE kernel (#3095 )	2024-03-13 13:45:26 -07:00
pybind.cpp	[Kernel] Use flashinfer for decoding (#4353 )	2024-05-03 15:51:27 -07:00
reduction_utils.cuh	[Kernel] Layernorm performance optimization (#3662 )	2024-03-30 14:26:38 -07:00