| .. |
|
attention
|
Implement PagedAttention V2 (#1348)
|
2023-10-16 00:59:57 -07:00 |
|
quantization/awq
|
Minor fix on AWQ kernel launch (#1356)
|
2023-10-15 21:53:56 -07:00 |
|
activation_kernels.cu
|
Avoid compiling kernels for double data type (#933)
|
2023-09-02 14:59:47 +09:00 |
|
activation.cpp
|
Implement approximate GELU kernels (#828)
|
2023-08-23 07:43:21 +09:00 |
|
attention.cpp
|
Implement PagedAttention V2 (#1348)
|
2023-10-16 00:59:57 -07:00 |
|
cache_kernels.cu
|
Avoid compiling kernels for double data type (#933)
|
2023-09-02 14:59:47 +09:00 |
|
cache.cpp
|
Memcpy kernel for flash attention (#29)
|
2023-04-10 18:22:49 -07:00 |
|
cuda_utils_kernels.cu
|
Allocate more shared memory to attention kernel (#1154)
|
2023-09-26 22:27:13 -07:00 |
|
cuda_utils.cpp
|
Allocate more shared memory to attention kernel (#1154)
|
2023-09-26 22:27:13 -07:00 |
|
dispatch_utils.h
|
Avoid compiling kernels for double data type (#933)
|
2023-09-02 14:59:47 +09:00 |
|
layernorm_kernels.cu
|
Avoid compiling kernels for double data type (#933)
|
2023-09-02 14:59:47 +09:00 |
|
layernorm.cpp
|
Add custom kernel for RMS normalization (#16)
|
2023-04-01 00:51:22 +08:00 |
|
pos_encoding_kernels.cu
|
[BugFix] Implement RoPE for GPT-J (#941)
|
2023-09-06 11:54:33 +09:00 |
|
pos_encoding.cpp
|
[BugFix] Implement RoPE for GPT-J (#941)
|
2023-09-06 11:54:33 +09:00 |
|
quantization.cpp
|
Implement AWQ quantization support for LLaMA (#1032)
|
2023-09-16 00:03:37 -07:00 |
|
reduction_utils.cuh
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |