vllm/kernels at d65049daabe9a80783b0547fd85dd39a18a905b3 - vllm

History

Cyrus Leung 7e7eae338d [Misc] Standardize RoPE handling for Qwen2-VL (#9250 )		2024-10-16 13:56:17 +08:00
..
benchmark_aqlm.py	[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )	2024-06-20 17:00:13 -06:00
benchmark_layernorm.py	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
benchmark_machete.py	[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )	2024-09-23 13:46:26 -04:00
benchmark_marlin.py	[Misc] Disambiguate quantized types via a new ScalarType (#6396 )	2024-08-02 13:51:58 -07:00
benchmark_moe.py	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
benchmark_paged_attention.py	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
benchmark_quant.py	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
benchmark_rope.py	[Misc] Standardize RoPE handling for Qwen2-VL (#9250 )	2024-10-16 13:56:17 +08:00
benchmark_shapes.py	Add marlin unit tests and marlin benchmark script (#4815 )	2024-05-16 09:36:49 -04:00
graph_machete_bench.py	[CI/Build] Update Ruff version (#8469 )	2024-09-18 11:00:56 +00:00
requirements.txt	[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )	2024-09-23 13:46:26 -04:00
weight_shapes.py	[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174 )	2024-08-20 07:09:33 -06:00