vllm/kernels at 546034b466bf11f12936791312981b9982850eb0 - vllm

Simon Mo 546034b466 [refactor] remove triton based sampler (#8524 )	2024-09-16 20:04:48 -07:00
..
__init__.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
allclose_default.py	[ROCm] Fix some kernels failed unit tests (#2498 )	2024-02-05 14:25:36 -08:00
conftest.py	[Kernel] Use flashinfer for decoding (#4353 )	2024-05-03 15:51:27 -07:00
quant_utils.py	[Feature][Hardware][Amd] Add fp8 Linear Layer for Rocm (#7210 )	2024-08-16 10:06:30 -07:00
test_activation.py	[Kernel][Misc] register ops to prevent graph breaks (#6917 )	2024-09-11 12:52:19 -07:00
test_attention_selector.py	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 )	2024-08-06 16:51:47 -04:00
test_attention.py	[Kernel][Hardware][Amd]Custom paged attention kernel for rocm (#8310 )	2024-09-13 17:01:11 -07:00
test_awq_triton.py	[Kernel] [Triton] [AMD] Adding Triton implementations awq_dequantize and awq_gemm to support AWQ (#7386 )	2024-08-28 15:37:47 -04:00
test_blocksparse_attention.py	[Misc/Testing] Use `torch.testing.assert_close` (#7324 )	2024-08-16 04:24:04 +00:00
test_cache.py	[Kernel][Misc] register ops to prevent graph breaks (#6917 )	2024-09-11 12:52:19 -07:00
test_causal_conv1d.py	[Kernel/Model] Migrate mamba_ssm and causal_conv1d kernels to vLLM (#7651 )	2024-08-28 15:06:52 -07:00
test_cutlass.py	[Kernel][Misc] register ops to prevent graph breaks (#6917 )	2024-09-11 12:52:19 -07:00
test_encoder_decoder_attn.py	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 )	2024-08-06 16:51:47 -04:00
test_flash_attn.py	register custom op for flash attn and use from torch.ops (#7536 )	2024-08-15 22:38:56 -07:00
test_flashinfer.py	[Core/Bugfix] Add query dtype as per FlashInfer API requirements. (#8173 )	2024-09-05 05:12:26 +00:00
test_fp8_quant.py	[Feature][Hardware][Amd] Add fp8 Linear Layer for Rocm (#7210 )	2024-08-16 10:06:30 -07:00
test_gguf.py	[Bugfix][Kernel] Add `IQ1_M` quantization implementation to GGUF kernel (#8357 )	2024-09-15 16:51:44 -06:00
test_int8_quant.py	[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270 )	2024-09-16 11:52:40 -07:00
test_layernorm.py	[Kernel][Misc] register ops to prevent graph breaks (#6917 )	2024-09-11 12:52:19 -07:00
test_machete_gemm.py	[Kernel][Misc] register ops to prevent graph breaks (#6917 )	2024-09-11 12:52:19 -07:00
test_mamba_ssm.py	[Kernel/Model] Migrate mamba_ssm and causal_conv1d kernels to vLLM (#7651 )	2024-08-28 15:06:52 -07:00
test_marlin_gemm.py	[Kernel][Misc] register ops to prevent graph breaks (#6917 )	2024-09-11 12:52:19 -07:00
test_moe.py	[Kernel] Enable 8-bit weights in Fused Marlin MoE (#8032 )	2024-09-16 09:47:19 -06:00
test_pos_encoding.py	[Misc/Testing] Use `torch.testing.assert_close` (#7324 )	2024-08-16 04:24:04 +00:00
test_prefix_prefill.py	[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208 )	2024-08-12 22:47:41 +00:00
utils.py	[CI/Build] Reorganize models tests (#7820 )	2024-09-13 10:20:06 -07:00