|
attention
|
Optimize MQA Kernel (#452)
|
2023-07-14 20:06:40 -04:00 |
|
activation_kernels.cu
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|
activation.cpp
|
Optimize data movement (#20)
|
2023-04-02 00:30:17 -07:00 |
|
attention.cpp
|
Optimize MQA Kernel (#452)
|
2023-07-14 20:06:40 -04:00 |
|
cache_kernels.cu
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|
cache.cpp
|
Memcpy kernel for flash attention (#29)
|
2023-04-10 18:22:49 -07:00 |
|
layernorm_kernels.cu
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|
pos_encoding_kernels.cu
|
Add support for LLaMA-2 (#505)
|
2023-07-20 11:38:27 -07:00 |
|
pos_encoding.cpp
|
Add support for GPT-NeoX (Pythia) (#50)
|
2023-04-28 00:32:10 -07:00 |
|
reduction_utils.cuh
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |