Tri Dao
|
ed4959b2eb
|
Change inline to __forceinline__, use __grid_constant__ param
|
2024-01-20 17:38:47 -08:00 |
|
Jeremy Reizenstein
|
ce3e7280f8
|
Allow varlen_fwd to take optional seqused_k (#647)
Co-authored-by: bottler <bottler@users.noreply.github.com>
|
2023-11-27 00:41:23 -08:00 |
|
Tri Dao
|
ee77b931b9
|
Swap seqlen_q and nheads for MQA to speed it up (h/t Daniel Haziza)
|
2023-09-10 22:56:33 -07:00 |
|
Tri Dao
|
37c6e05406
|
Implement flash_attn_with_kvcache
|
2023-09-04 00:11:44 -07:00 |
|
Tri Dao
|
9e5e8bc91e
|
Change causal mask to be aligned to bottom-right instead of top-left
|
2023-08-24 23:41:07 -07:00 |
|
Tri Dao
|
4f285b3547
|
FlashAttention-2 release
|
2023-07-17 06:21:34 -07:00 |
|