Add Faster Neighborhood Attention to pubs (#1471)
This commit is contained in:
parent
d6580c3dc0
commit
c5239d8312
@ -1,5 +1,9 @@
|
|||||||
# Publications Using Cutlass
|
# Publications Using Cutlass
|
||||||
|
|
||||||
|
## 2024
|
||||||
|
|
||||||
|
- ["Faster Neighborhood Attention: Reducing the O(n^2) Cost of Self Attention at the Threadblock Level"](https://arxiv.org/abs/2403.04690). Ali Hassani, Wen-Mei Hwu, Humphrey Shi. _arXiv_, March 2024.
|
||||||
|
|
||||||
## 2023
|
## 2023
|
||||||
|
|
||||||
- ["A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library"](https://arxiv.org/abs/2312.11918). Ganesh Bikshandi, Jay Shah. _arXiv_, December 2023.
|
- ["A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library"](https://arxiv.org/abs/2312.11918). Ganesh Bikshandi, Jay Shah. _arXiv_, December 2023.
|
||||||
|
Loading…
Reference in New Issue
Block a user