vllm/csrc at 1ef0d2efd07f93bc7b0cfb597d8947b49e2fdaac - vllm

History

Charlie Fu 1ef0d2efd0 [Kernel][Hardware][Amd]Custom paged attention kernel for rocm (#8310 )		2024-09-13 17:01:11 -07:00
..
attention	[AMD][CI/Build] Disambiguation of the function call for ROCm 6.2 headers compatibility (#7477 )	2024-08-21 16:47:36 -07:00
core	[Bugfix] Allow ScalarType to be compiled with pytorch 2.3 and add checks for registering FakeScalarType and dynamo support. (#7886 )	2024-08-27 23:13:45 -04:00
cpu	[Kernel][Misc] register ops to prevent graph breaks (#6917 )	2024-09-11 12:52:19 -07:00
cutlass_extensions	[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174 )	2024-08-20 07:09:33 -06:00
mamba	[Kernel/Model] Migrate mamba_ssm and causal_conv1d kernels to vLLM (#7651 )	2024-08-28 15:06:52 -07:00
moe	[Misc] Fused MoE Marlin support for GPTQ (#8217 )	2024-09-09 23:02:52 -04:00
prepare_inputs	[multi-step] add flashinfer backend (#7928 )	2024-09-12 11:16:22 -07:00
quantization	[Kernel][Misc] register ops to prevent graph breaks (#6917 )	2024-09-11 12:52:19 -07:00
rocm	[Kernel][Hardware][Amd]Custom paged attention kernel for rocm (#8310 )	2024-09-13 17:01:11 -07:00
activation_kernels.cu	[Model] Port over CLIPVisionModel for VLMs (#5591 )	2024-06-20 11:52:09 +00:00
cache_kernels.cu	Add fp8 support to `reshape_and_cache_flash` (#6667 )	2024-07-24 18:36:52 +00:00
cache.h	Add fp8 support to `reshape_and_cache_flash` (#6667 )	2024-07-24 18:36:52 +00:00
cuda_compat.h	[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927 )	2024-06-02 14:13:26 -07:00
cuda_utils_kernels.cu	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
cuda_utils.h	[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174 )	2024-08-20 07:09:33 -06:00
custom_all_reduce_test.cu	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00
custom_all_reduce.cu	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
custom_all_reduce.cuh	[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722 )	2024-05-22 07:18:41 +00:00
dispatch_utils.h	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
layernorm_kernels.cu	[Kernel] Replaced `blockReduce[...]` functions with `cub::BlockReduce` (#7233 )	2024-08-21 20:18:00 -04:00
moe_align_block_size_kernels.cu	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
ops.h	[multi-step] add flashinfer backend (#7928 )	2024-09-12 11:16:22 -07:00
pos_encoding_kernels.cu	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
torch_bindings.cpp	[multi-step] add flashinfer backend (#7928 )	2024-09-12 11:16:22 -07:00