vllm/tests/kernels
Varun Sundar Rabindranath b5241e41d9
[ Kernel ] FP8 Dynamic-Per-Token Quant Kernel (#6511)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-07-18 01:38:35 +00:00
..
__init__.py [CI/Build] Move test_utils.py to tests/utils.py (#4425) 2024-05-13 23:50:09 +09:00
allclose_default.py [ROCm] Fix some kernels failed unit tests (#2498) 2024-02-05 14:25:36 -08:00
conftest.py [Kernel] Use flashinfer for decoding (#4353) 2024-05-03 15:51:27 -07:00
quant_utils.py [ Kernel ] FP8 Dynamic-Per-Token Quant Kernel (#6511) 2024-07-18 01:38:35 +00:00
test_activation.py [Misc] Add CustomOp interface for device portability (#5255) 2024-06-05 09:18:19 -07:00
test_attention_selector.py [Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (#4888) 2024-07-08 17:12:15 +00:00
test_attention.py [Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081) 2024-07-16 15:31:32 -07:00
test_blocksparse_attention.py [Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081) 2024-07-16 15:31:32 -07:00
test_cache.py [Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081) 2024-07-16 15:31:32 -07:00
test_cutlass.py [hardware][misc] introduce platform abstraction (#6080) 2024-07-02 20:12:22 -07:00
test_encoder_decoder_attn.py [Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (#4888) 2024-07-08 17:12:15 +00:00
test_flash_attn.py [mypy] Enable type checking for test directory (#5017) 2024-06-15 04:45:31 +00:00
test_flashinfer.py [Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051) 2024-07-04 16:35:51 -07:00
test_fp8_quant.py [ Kernel ] FP8 Dynamic-Per-Token Quant Kernel (#6511) 2024-07-18 01:38:35 +00:00
test_int8_quant.py [ Kernel ] FP8 Dynamic-Per-Token Quant Kernel (#6511) 2024-07-18 01:38:35 +00:00
test_layernorm.py [Misc] Add CustomOp interface for device portability (#5255) 2024-06-05 09:18:19 -07:00
test_marlin_gemm.py [ Misc ] Refactor Marlin Python Utilities (#6082) 2024-07-11 15:40:11 +00:00
test_moe.py [ Misc ] Refactor MoE to isolate Fp8 From Mixtral (#5970) 2024-07-02 21:54:35 +00:00
test_pos_encoding.py [mypy] Enable type checking for test directory (#5017) 2024-06-15 04:45:31 +00:00
test_prefix_prefill.py [Bugfix][Kernel] allow non-power-of-2 for prefix prefill with alibi (#4573) 2024-05-08 09:19:58 -07:00
test_rand.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
test_sampler.py [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
utils.py [Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (#4888) 2024-07-08 17:12:15 +00:00