Tri Dao
|
2406f28805
|
Enable headdim 256 backward on consumer GPUs (Ampere, Ada)
|
2024-02-21 15:56:19 -08:00 |
|
Tao He
|
204c3c6d1b
|
Fixes an error in comment (#785)
Signed-off-by: Tao He <sighingnow@gmail.com>
|
2024-01-23 12:38:29 -08:00 |
|
Tri Dao
|
54e80a3829
|
Implement page KV cache
Co-authored-by: ljss <450993438@qq.com>
|
2024-01-22 22:47:30 -08:00 |
|
Erich Schubert
|
99ea4baa1d
|
Typo in README (#760)
|
2024-01-08 09:59:00 -08:00 |
|
Tri Dao
|
732654583c
|
Implement deterministic backward (thanks to Meituan)
|
2023-12-23 17:57:36 -08:00 |
|
Tri Dao
|
50d144c906
|
Mention Alibi in README
|
2023-12-21 23:48:16 -08:00 |
|
Tri Dao
|
7f31e7c16a
|
Bump to v2.3.2
|
2023-10-08 17:21:29 -07:00 |
|
Tri Dao
|
5a83425442
|
Change constexpr int to constexpr static int
|
2023-10-08 16:26:33 -07:00 |
|
Tri Dao
|
3a9fe7b0fa
|
Add change log
|
2023-10-05 14:19:08 -07:00 |
|
Tri Dao
|
aa4fd2d166
|
Clarify that Windows is not supported right now
|
2023-10-05 14:00:45 -07:00 |
|
Tri Dao
|
0c04943fa2
|
Require CUDA 11.6+, clean up setup.py
|
2023-09-03 21:24:56 -07:00 |
|
Jeffrey Quesnelle
|
1d817a8ffc
|
fix citation in README (#501)
|
2023-08-29 11:15:33 -07:00 |
|
Tri Dao
|
45ba93cd96
|
Add newlines to README
|
2023-08-24 23:54:13 -07:00 |
|
Tri Dao
|
9e5e8bc91e
|
Change causal mask to be aligned to bottom-right instead of top-left
|
2023-08-24 23:41:07 -07:00 |
|
Tri Dao
|
d30f2e1cd5
|
Bump to v2.0.4
|
2023-08-01 09:01:07 -07:00 |
|
Ian Timmis
|
cbf982afa5
|
README syntax highlighting (#365)
* README syntax highlighting
Adds syntax highlighting to README
* Update README.md
|
2023-07-23 00:21:30 -07:00 |
|
Tri Dao
|
d1a3b52f17
|
Add instruction about limiting number of ninja jobs
|
2023-07-17 23:17:47 -07:00 |
|
Tri Dao
|
b4cc152e97
|
Make sure dout is contiguous
|
2023-07-17 21:54:44 -07:00 |
|
Tri Dao
|
4f285b3547
|
FlashAttention-2 release
|
2023-07-17 06:21:34 -07:00 |
|
Tri Dao
|
ce68305c84
|
Update installation instruction
|
2023-05-25 16:52:52 -07:00 |
|
Tri Dao
|
f0c40b7ddb
|
Recommend Nvidia's Pytorch container
|
2023-05-19 09:41:14 -07:00 |
|
Tri Dao
|
40a25c8ee7
|
Update roadmap
|
2023-05-17 08:32:26 -07:00 |
|
Anthony Hu
|
d63cfc3551
|
Use pyproject.toml to specify build dependencies
|
2023-04-27 11:51:52 +01:00 |
|
Tri Dao
|
74af023316
|
Bump version to 1.0.0
|
2023-04-11 23:32:35 -07:00 |
|
Tri Dao
|
1b18f1b7a1
|
Support H100
|
2023-03-15 14:59:02 -07:00 |
|
Tri Dao
|
f28d61cb2a
|
Update README on requirements (nvcc and Pytorch)
|
2023-03-13 12:48:07 -07:00 |
|
Tri Dao
|
57ee618170
|
Merge pull request #94 from calebthomas259/main
Add a simple tutorial to README.md
|
2023-02-14 19:03:08 -08:00 |
|
Tri Dao
|
2dc2a19589
|
Update roadmap
|
2023-02-09 12:21:30 -08:00 |
|
Caleb Thomas
|
c9a649805b
|
Add a simple tutorial to README.md
|
2022-12-27 14:13:59 +08:00 |
|
Tri Dao
|
4a6eaa9f27
|
Update configs, add results
|
2022-11-29 04:46:43 -08:00 |
|
Tri Dao
|
45bcf37b97
|
[Docs] Capitalize the bibtex citation
|
2022-11-22 02:12:22 -08:00 |
|
Tri Dao
|
4040256b5e
|
Update pip install instructions, bump to 0.2
|
2022-11-15 14:10:48 -08:00 |
|
Tri Dao
|
2e33fc8e36
|
Add GPT and ViT models
|
2022-11-13 22:30:23 -08:00 |
|
Tri Dao
|
3dda4f76de
|
Update README
|
2022-11-13 16:52:40 -08:00 |
|
Tri Dao
|
46fd2a20b2
|
Support all head dims that are multiples of 8, up to 128
|
2022-10-24 16:04:21 -07:00 |
|
Tri Dao
|
2ed471ecc4
|
Add tests for numerical error
|
2022-07-22 17:54:09 -04:00 |
|
Tri Dao
|
42f54d8840
|
Edit mention of Triton implementation
Phil Tillet suggests calling it "experimental".
|
2022-07-11 17:02:29 -07:00 |
|
Tri Dao
|
4577151ff8
|
Link to Triton implementation
|
2022-07-11 16:01:43 -07:00 |
|
Tri Dao
|
d1fc80a3bb
|
Link to IEEE Spectrum article on MLPerf
|
2022-07-10 12:11:46 -07:00 |
|
Tri Dao
|
1bbebccc0a
|
Edit README to mention bf16 support
|
2022-07-09 23:34:29 -07:00 |
|
Tri Dao
|
de19de7ab1
|
Implement for bf16
|
2022-07-09 23:31:56 -07:00 |
|
Tri Dao
|
6c3a8c65af
|
Implement cross attention
|
2022-07-03 17:48:12 -07:00 |
|
Tri Dao
|
450b64fe44
|
Add README section on issues
|
2022-06-27 13:50:16 -07:00 |
|
Dan Fu
|
765741c1ee
|
More explanation
|
2022-06-14 11:55:14 -07:00 |
|
Dan Fu
|
2d5b2483b8
|
Speedup graph for A100, d128
|
2022-06-14 11:54:16 -07:00 |
|
Tri Dao
|
d3e6440958
|
Implement bwd for head dim 128
|
2022-06-11 17:52:36 -07:00 |
|
Dan Fu
|
0a398dfc37
|
Broken link
|
2022-06-04 17:28:45 -07:00 |
|
Dan Fu
|
bd60750e0b
|
T4
|
2022-06-04 17:27:51 -07:00 |
|
Tri Dao
|
f2d8d4104e
|
Edit README: support Turing (SM75)
|
2022-06-04 16:06:48 -07:00 |
|
Dan Fu
|
ad6c694bb3
|
3090 speedup
|
2022-06-01 20:07:00 -07:00 |
|