Tri Dao
|
1a2c3e8c25
|
Bump to v2.4.2
|
2023-12-25 16:28:57 -08:00 |
|
Tri Dao
|
73df3be7d5
|
Add test for BTLM init
|
2023-12-25 15:16:27 -08:00 |
|
Tri Dao
|
7ffba9a501
|
Implement BTLM model
|
2023-12-24 20:35:12 -08:00 |
|
Tri Dao
|
2e29dacf0c
|
Implement muParam
|
2023-12-24 20:34:48 -08:00 |
|
Tri Dao
|
3f7d5786ba
|
Pass alibi slopes to flash_attn_with_kvcache during generation
|
2023-12-24 20:31:59 -08:00 |
|
Tri Dao
|
f844852485
|
Bump to v2.4.1
|
2023-12-23 21:00:39 -08:00 |
|
Tri Dao
|
0842ec0da4
|
Don't dispatch to local if window size >= seqlen_k
|
2023-12-23 20:59:26 -08:00 |
|
Tri Dao
|
732654583c
|
Implement deterministic backward (thanks to Meituan)
|
2023-12-23 17:57:36 -08:00 |
|
Tri Dao
|
2c7d7b7396
|
Implement norm head for Baichuan2
|
2023-12-22 16:55:40 -08:00 |
|
Tri Dao
|
68f178aa4b
|
[CI] Don't compile for python 3.7 pytorch 2.2
|
2023-12-22 10:10:02 -08:00 |
|
Tri Dao
|
7316277303
|
Bump to v2.4.0
|
2023-12-22 00:09:53 -08:00 |
|
Tri Dao
|
50d144c906
|
Mention Alibi in README
|
2023-12-21 23:48:16 -08:00 |
|
Tri Dao
|
8448c02889
|
Update cutlass to v3.3.0
|
2023-12-21 23:25:50 -08:00 |
|
Tri Dao
|
c3b2196652
|
Add Alibi to MHA, test with Baichuan-13B
|
2023-12-21 22:49:55 -08:00 |
|
Tri Dao
|
701b51bfc3
|
[CI] Use torch-nightly 20231106 instead of 20231127
|
2023-12-21 22:28:09 -08:00 |
|
Tri Dao
|
5ab9b3667b
|
Clean up alibi, implement non-causal alibi
|
2023-12-21 22:27:40 -08:00 |
|
Tri Dao
|
bc28eacc60
|
Format flash_attn_interface.py
|
2023-12-19 23:13:53 -08:00 |
|
Tri Dao
|
0a146185d6
|
[Gen] Remove minor dead code
|
2023-12-19 22:57:39 -08:00 |
|
Sanghun Cho
|
e4f726fc44
|
Support alibi, by Sanghun Cho from Kakao Brain
* hard-code alibi in fwd
* use params.h as hun_heads
* hard-code alibi in bwd
* add alibi on/off option
* compute alibi_start, ratio outside of kernels
* fix minor merge conflict
* add test_alibi.py
* change apply_alibi() location before masking
* add alibi in splitkv kernel
* fix backward func # of returns
* add out-of-bound check in apply_alibi()
* update test_alibi.py
* update test_alibi.py for kvcache
* simplify alibi parameter interface
* fix performance issue
by computing alibi outside of branch
* update test_flash_attn_varlen_func() for left padding
* implement alibi_slopes (b, nh) loading
* optimize apply_alibi() a bit
* update test cases for alibi_slopes loading
* reflect stylistic comments
* disable "seqlenq_ngroups_swapped" when using alibi
---------
Co-authored-by: monk.detective <monk.detective@kakaobrain.com>
|
2023-12-19 22:56:06 -08:00 |
|
Tri Dao
|
cd089597fd
|
[LayerNorm] Implement dropout in fused residual + LN/RMSNorm
|
2023-12-19 16:26:07 -08:00 |
|
Tri Dao
|
713bd3aa9a
|
[CrossEntropy] Test longer sequences
|
2023-12-16 19:11:23 -08:00 |
|
Tri Dao
|
08124c8f9c
|
[CrossEntropy] Implement logit_scale option
|
2023-12-16 18:39:37 -08:00 |
|
Tri Dao
|
9356a1c038
|
[LayerNorm] Implement layer_norm_linear
|
2023-11-30 21:46:07 -08:00 |
|
Tri Dao
|
92dd5703ec
|
Bump to v2.3.6
|
2023-11-27 16:23:39 -08:00 |
|
Tri Dao
|
d4a7c8ffbb
|
[CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly
|
2023-11-27 16:21:28 -08:00 |
|
Jeremy Reizenstein
|
ce3e7280f8
|
Allow varlen_fwd to take optional seqused_k (#647)
Co-authored-by: bottler <bottler@users.noreply.github.com>
|
2023-11-27 00:41:23 -08:00 |
|
Tri Dao
|
23b77c8148
|
Bump to v2.3.5
|
2023-11-26 19:08:28 -08:00 |
|
Tri Dao
|
b4bf9cc1f3
|
Fix performance regression with causal
|
2023-11-26 19:07:25 -08:00 |
|
Tri Dao
|
2c3baba4a6
|
Bump to v2.3.4
|
2023-11-19 23:21:31 -08:00 |
|
Tri Dao
|
aaa1474129
|
[CrossEntropy] Simplify the case of large vocab with Tensor Parallel
|
2023-11-19 23:19:36 -08:00 |
|
Shijie
|
abf04a56e1
|
fix flash ce mp large vocab (#673)
|
2023-11-19 23:01:07 -08:00 |
|
Tri Dao
|
db2f80692c
|
Write zero to out / grad if seqlen_q or seqlen_k is zero
|
2023-11-19 22:20:01 -08:00 |
|
Tri Dao
|
43bb6d8aaa
|
Update cutlass to 3.2.2
|
2023-11-19 21:43:48 -08:00 |
|
Driss Guessous
|
dc4b9ad6c4
|
add checks (#640)
|
2023-11-19 20:43:27 -08:00 |
|
Tri Dao
|
017716451d
|
[LayerNorm] Add postnorm residual + LayerNorm/RMSNorm in Triton
|
2023-11-13 22:37:55 -08:00 |
|
Tri Dao
|
79bd1a2d5d
|
[LayerNorm] Implement residual + LayerNorm/RMSNorm in Triton
|
2023-11-13 02:04:49 -08:00 |
|
Antony Frolov
|
3566596ad8
|
Fix typo in RotaryEmbedding forward output type (#666)
|
2023-11-09 11:43:02 -08:00 |
|
Tri Dao
|
83aef842be
|
Bump to v2.3.3
|
2023-10-24 00:24:07 -07:00 |
|
Tri Dao
|
c79de85ffa
|
[CrossEntropy] Fix triton cross_entropy_loss IMA for >=2B elements
|
2023-10-24 00:17:34 -07:00 |
|
Tri Dao
|
02ac572f3f
|
Clarify inference README is a placeholder
|
2023-10-12 10:14:58 -07:00 |
|
Tri Dao
|
7f31e7c16a
|
Bump to v2.3.2
|
2023-10-08 17:21:29 -07:00 |
|
Tri Dao
|
5a83425442
|
Change constexpr int to constexpr static int
|
2023-10-08 16:26:33 -07:00 |
|
Tri Dao
|
3a9fe7b0fa
|
Add change log
|
2023-10-05 14:19:08 -07:00 |
|
Tri Dao
|
aa4fd2d166
|
Clarify that Windows is not supported right now
|
2023-10-05 14:00:45 -07:00 |
|
Tri Dao
|
5e525a8dc8
|
[CI] Use official Pytorch 2.1, add CUDA 11.8 for Pytorch 2.1
|
2023-10-03 22:20:30 -07:00 |
|
Tri Dao
|
21c3b0d8f6
|
Bump to v2.3.1
|
2023-10-03 19:56:45 -07:00 |
|
Tri Dao
|
e279bf8ed9
|
[Gen] Accept cache_batch_idx to index into the KV cache
|
2023-10-03 16:27:26 -07:00 |
|
Tri Dao
|
601b4dc48d
|
Bump to v2.3.0
|
2023-09-26 22:08:29 -07:00 |
|
Tri Dao
|
083e8f525f
|
Implement local attention
Co-authored-by: Timothee Lacroix <t@mistral.ai>
|
2023-09-26 16:31:08 -07:00 |
|
Katherine Crowson
|
4c8ff9154e
|
Fix NameError and typo in ApplyRotaryEmbQKV_ (#569)
|
2023-09-25 10:47:34 -07:00 |
|