vllm/kernels at 260d119e864edbf023b1be7fa446a08bbea11f80 - vllm

History

Tyler Michael Smith 260d119e86 [Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137 )		2024-06-01 06:45:32 +00:00
..
__init__.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
allclose_default.py	[ROCm] Fix some kernels failed unit tests (#2498 )	2024-02-05 14:25:36 -08:00
conftest.py	[Kernel] Use flashinfer for decoding (#4353 )	2024-05-03 15:51:27 -07:00
test_activation.py	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 )	2024-05-13 23:50:09 +09:00
test_attention_selector.py	[Misc] Take user preference in attention selector (#4960 )	2024-05-23 07:55:56 +09:00
test_attention.py	[Model] Support MAP-NEO model (#5081 )	2024-05-30 19:24:41 -07:00
test_blocksparse_attention.py	[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )	2024-05-24 22:00:52 -07:00
test_cache.py	[Model] Support MAP-NEO model (#5081 )	2024-05-30 19:24:41 -07:00
test_cutlass.py	[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137 )	2024-06-01 06:45:32 +00:00
test_flash_attn.py	[Kernel] Add flash-attn back (#4907 )	2024-05-19 18:11:30 -07:00
test_int8_quant.py	[Kernel] Initial Activation Quantization Support (#4525 )	2024-05-23 21:29:18 +00:00
test_layernorm.py	[Kernel] Layernorm performance optimization (#3662 )	2024-03-30 14:26:38 -07:00
test_marlin_gemm.py	Marlin 24 prefill performance improvement (about 25% better on average) (#4983 )	2024-05-23 02:39:27 -04:00
test_moe.py	[Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with Dynamic/Static Activations) (#4527 )	2024-05-04 11:45:16 -07:00
test_pos_encoding.py	[Model] Support MAP-NEO model (#5081 )	2024-05-30 19:24:41 -07:00
test_prefix_prefill.py	[Bugfix][Kernel] allow non-power-of-2 for prefix prefill with alibi (#4573 )	2024-05-08 09:19:58 -07:00
test_rand.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
test_sampler.py	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00