| .. |
|
attention
|
Fix integer overflows in attention & cache ops (#1514)
|
2023-10-31 15:19:30 -07:00 |
|
quantization
|
Support SqueezeLLM (#1326)
|
2023-10-21 23:14:59 -07:00 |
|
activation_kernels.cu
|
Support YaRN models (#1264)
|
2023-11-03 14:12:48 -07:00 |
|
activation.cpp
|
Implement approximate GELU kernels (#828)
|
2023-08-23 07:43:21 +09:00 |
|
attention.cpp
|
Implement PagedAttention V2 (#1348)
|
2023-10-16 00:59:57 -07:00 |
|
cache_kernels.cu
|
Fix integer overflows in attention & cache ops (#1514)
|
2023-10-31 15:19:30 -07:00 |
|
cache.cpp
|
Memcpy kernel for flash attention (#29)
|
2023-04-10 18:22:49 -07:00 |
|
cuda_utils_kernels.cu
|
Allocate more shared memory to attention kernel (#1154)
|
2023-09-26 22:27:13 -07:00 |
|
cuda_utils.cpp
|
Allocate more shared memory to attention kernel (#1154)
|
2023-09-26 22:27:13 -07:00 |
|
dispatch_utils.h
|
Avoid compiling kernels for double data type (#933)
|
2023-09-02 14:59:47 +09:00 |
|
layernorm_kernels.cu
|
[Optimization] Implement fused add rmsnorm (#1667)
|
2023-11-18 18:18:02 -08:00 |
|
layernorm.cpp
|
[Optimization] Implement fused add rmsnorm (#1667)
|
2023-11-18 18:18:02 -08:00 |
|
pos_encoding_kernels.cu
|
Support YaRN models (#1264)
|
2023-11-03 14:12:48 -07:00 |
|
pos_encoding.cpp
|
[BugFix] Implement RoPE for GPT-J (#941)
|
2023-09-06 11:54:33 +09:00 |
|
quantization.cpp
|
Support SqueezeLLM (#1326)
|
2023-10-21 23:14:59 -07:00 |
|
reduction_utils.cuh
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |