vllm/kernels at main - vllm - Gitea: Git with a cup of tea

youkaichao eebad39f26 [torch.compile] support all attention backends (#10558 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-22 14:04:42 -08:00
..
__init__.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
allclose_default.py	[ROCm] Fix some kernels failed unit tests (#2498 )	2024-02-05 14:25:36 -08:00
conftest.py	[Kernel] Use flashinfer for decoding (#4353 )	2024-05-03 15:51:27 -07:00
quant_utils.py	[Hardware][ROCM] using current_platform.is_rocm (#9642 )	2024-10-28 04:07:00 +00:00
test_activation.py	[CI] Prune back the number of tests in tests/kernels/* (#9932 )	2024-11-05 16:02:32 -05:00
test_aqlm.py	[Kernel] Fullgraph and opcheck tests (#8479 )	2024-09-25 08:35:52 -06:00
test_attention_selector.py	[Platform][Refactor] Extract func `get_default_attn_backend` to `Platform` (#10358 )	2024-11-19 11:22:26 +08:00
test_attention.py	[CI] Prune back the number of tests in tests/kernels/* (#9932 )	2024-11-05 16:02:32 -05:00
test_awq_marlin.py	[CI] Prune back the number of tests in tests/kernels/* (#9932 )	2024-11-05 16:02:32 -05:00
test_awq_triton.py	[Hardware] using current_platform.seed_everything (#9785 )	2024-10-29 14:47:44 +00:00
test_awq.py	[Bugfix] Try to handle older versions of pytorch (#9086 )	2024-10-08 14:28:12 -07:00
test_blocksparse_attention.py	[CI] Prune back the number of tests in tests/kernels/* (#9932 )	2024-11-05 16:02:32 -05:00
test_cache.py	[CI] Prune back the number of tests in tests/kernels/* (#9932 )	2024-11-05 16:02:32 -05:00
test_causal_conv1d.py	[BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 (#9838 )	2024-10-31 20:06:25 +00:00
test_cutlass.py	[CI] Prune back the number of tests in tests/kernels/* (#9932 )	2024-11-05 16:02:32 -05:00
test_encoder_decoder_attn.py	[torch.compile] support all attention backends (#10558 )	2024-11-22 14:04:42 -08:00
test_flash_attn.py	[Hardware] using current_platform.seed_everything (#9785 )	2024-10-29 14:47:44 +00:00
test_flashinfer.py	[Hardware] using current_platform.seed_everything (#9785 )	2024-10-29 14:47:44 +00:00
test_fp8_quant.py	[Hardware] using current_platform.seed_everything (#9785 )	2024-10-29 14:47:44 +00:00
test_ggml.py	[Kernel] Fullgraph and opcheck tests (#8479 )	2024-09-25 08:35:52 -06:00
test_gguf.py	[Hardware] using current_platform.seed_everything (#9785 )	2024-10-29 14:47:44 +00:00
test_gptq.py	[Kernel] Fullgraph and opcheck tests (#8479 )	2024-09-25 08:35:52 -06:00
test_int8_quant.py	[bugfix] Fix static asymmetric quantization case (#10334 )	2024-11-15 09:35:11 +08:00
test_layernorm.py	[torch.compile] Fuse RMSNorm with quant (#9138 )	2024-11-08 21:20:08 +00:00
test_machete_mm.py	[Kernel] Initial Machete W4A8 support + Refactors (#9855 )	2024-11-18 12:59:29 -07:00
test_mamba_ssm.py	[CI/Build] drop support for Python 3.8 EOL (#8464 )	2024-11-06 07:11:55 +00:00
test_marlin_gemm.py	[Bugfix] Marlin 2:4 temp fix for large M dim (>256) (#10464 )	2024-11-19 19:40:33 -08:00
test_moe.py	[Misc] Bump up test_fused_moe tolerance (#10364 )	2024-11-15 16:31:18 +00:00
test_permute_cols.py	[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )	2024-09-23 13:46:26 -04:00
test_pos_encoding.py	[CI] Prune back the number of tests in tests/kernels/* (#9932 )	2024-11-05 16:02:32 -05:00
test_prefix_prefill.py	[Hardware] using current_platform.seed_everything (#9785 )	2024-10-29 14:47:44 +00:00
test_rotary_embedding.py	[Kernel] Fullgraph and opcheck tests (#8479 )	2024-09-25 08:35:52 -06:00
test_triton_scaled_mm.py	[Kernel][Triton] Add Triton implementation for scaled_mm_triton to support fp8 and int8 SmoothQuant, symmetric case (#9857 )	2024-11-08 19:59:22 -05:00
test_utils.py	[Kernel] Fullgraph and opcheck tests (#8479 )	2024-09-25 08:35:52 -06:00
utils.py	[Encoder Decoder] Add flash_attn kernel support for encoder-decoder models (#9559 )	2024-11-01 23:22:49 -07:00