Commit Graph

124 Commits

Author SHA1 Message Date
Tri Dao
321c57d07d Set block size of SM75 fwd to 256 if there's no dropout
This speeds up the fwd by 1.5x.
2022-06-04 16:51:28 -07:00
Tri Dao
f2d8d4104e Edit README: support Turing (SM75) 2022-06-04 16:06:48 -07:00
Tri Dao
d380e87fb6 Don't use Smem_dp_sum in backward pass
To reduce smem usage for SM75
2022-06-04 16:01:36 -07:00
Tri Dao
b17c6fe235 Reduce smem usage for Q and dO in the backward pass
From 4KB per buffer to 2KB per buffer. This saves us 8KB of smem (each Q and dO
have 2 buffers)
2022-06-03 16:59:11 -07:00
Tri Dao
2712aa4c8d Support Turing mma instructions 2022-06-03 16:58:44 -07:00
Tri Dao
050873327e Remove softmax fp16 max 2022-06-02 14:09:46 -07:00
Tri Dao
14dc326e59 Use Cutlass gemm as WarpMma 2022-06-02 10:33:32 -07:00
Tri Dao
e78e7c9553 Remove old backward 2022-06-02 10:13:44 -07:00
Tri Dao
512c98ee05 Add Cutlass as submodule 2022-06-02 09:54:16 -07:00
Dan Fu
ad6c694bb3 3090 speedup 2022-06-01 20:07:00 -07:00
Tri Dao
5a61cb7729 Rename src -> flash_attn 2022-06-01 18:50:26 -07:00
Tri Dao
c41479d66d Support SM86 GPUs 2022-06-01 18:49:47 -07:00
Dan Fu
4b7cfb5f45 Citation 2022-05-30 13:29:04 -07:00
Dan Fu
963173fcb5 Jpg resolution 2022-05-30 11:47:42 -07:00
Dan Fu
cd04d29883 Fix jpg 2022-05-30 11:46:01 -07:00
Tri Dao
a78745189a Add paper arXiv link 2022-05-29 18:15:43 -07:00
Tri Dao
d9fff84bd0 Edit roadmap 2022-05-29 15:44:18 -07:00
Tri Dao
e4ffe5d50e Convert banner figure from pdf to jpg 2022-05-29 15:39:17 -07:00
Tri Dao
67c3779598 Reorganize directories, add banner figure 2022-05-29 15:34:22 -07:00
Dan Fu
7025a092d1 Make png images into jpg for dark mode 2022-05-28 22:46:49 +01:00
Dan Fu
4decc3c166 README typo 2022-05-27 22:38:20 +01:00
Dan Fu
dc6d130088 Add speedup to README
Update images

Update images

Update description
2022-05-27 22:36:56 +01:00
Tri Dao
9dbc491aa5 Rename, add benchmarking script 2022-05-26 13:57:38 -07:00
Tri Dao
1fcbe6f0d0 First release 2022-05-20 14:21:58 -07:00