Tri Dao
|
5d07483bbc
|
Refactor Gmem code to store q, k, v pointers separately
|
2022-06-12 16:37:32 -07:00 |
|
Tri Dao
|
d3e6440958
|
Implement bwd for head dim 128
|
2022-06-11 17:52:36 -07:00 |
|
Tri Dao
|
0d854692c6
|
Implement fwd for head dim 128
|
2022-06-11 17:52:36 -07:00 |
|
Tri Dao
|
321c57d07d
|
Set block size of SM75 fwd to 256 if there's no dropout
This speeds up the fwd by 1.5x.
|
2022-06-04 16:51:28 -07:00 |
|
Tri Dao
|
2712aa4c8d
|
Support Turing mma instructions
|
2022-06-03 16:58:44 -07:00 |
|
Tri Dao
|
9dbc491aa5
|
Rename, add benchmarking script
|
2022-05-26 13:57:38 -07:00 |
|