Commit Graph

553 Commits

Author SHA1 Message Date
Tri Dao
395e5a0dba Move rotary device functions to a separate file 2024-01-20 18:01:18 -08:00
Tri Dao
3e2c827d9a Remove unused kernel_traits file 2024-01-20 17:41:44 -08:00
Tri Dao
66a127aef8 Refactor masking in fwd pass into 1 object 2024-01-20 17:39:53 -08:00
Tri Dao
ed4959b2eb Change inline to __forceinline__, use __grid_constant__ param 2024-01-20 17:38:47 -08:00
Tri Dao
6f706eff96 Make Softmax an object 2024-01-19 16:09:31 -08:00
Tri Dao
4ea866ca19 Make Alibi an object 2024-01-15 00:07:11 -08:00
Tri Dao
5aca153d6d Move bwd preprocess kernels to a separate file 2024-01-14 16:57:03 -08:00
Tri Dao
df1418f9db Move softmax_rescale_o to softmax.h 2024-01-14 15:06:06 -08:00
Tri Dao
6777336a1c Move masking to a separate file (mask.h) 2024-01-14 12:43:47 -08:00
Tri Dao
9448264ddd Remove seqq_parallel backward kernel that's not used 2024-01-14 12:25:49 -08:00
Tri Dao
1274ec3e7e Move dropout to a separate file (dropout.h) 2024-01-14 12:19:17 -08:00
Tri Dao
10dad61277 apply_dropout now takes tensor of rowcol layout 2024-01-14 01:03:23 -08:00
Tri Dao
d9cbcfb41c Remove dead code in philox.cuh 2024-01-13 02:02:03 -08:00
Tri Dao
a7b66ae25a Simplify writing softmax to gmem 2024-01-13 00:25:04 -08:00
Tri Dao
8d1b169ed1 Simplify SmemLayoutVtransposed in kernel_traits.h 2024-01-12 11:53:29 -08:00
Tri Dao
c9861a032d [LayerNorm] Initialize mean and rstd tensor using x.device 2024-01-09 16:30:31 -08:00
Erich Schubert
99ea4baa1d
Typo in README (#760) 2024-01-08 09:59:00 -08:00
Tri Dao
abbc131173 [LayerNorm] Switch from CUDA to Triton implementation 2024-01-05 00:31:17 -08:00
Tri Dao
f5b308e258 [LayerNorm] Rename layernorm.py -> layer_norm.py 2024-01-05 00:21:03 -08:00
Tri Dao
665b55e2e2 [LayerNorm] Implement parallel layer norm in Triton 2024-01-04 23:15:35 -08:00
Tri Dao
aa5c6438c5 [LayerNorm] Implement rowscale in Triton layernorm 2024-01-04 01:07:03 -08:00
jiaxingli
386e391117
Fix: implement deterministic backward in mha (#748)
* fix deterministic

* fix deterministic
2024-01-02 18:13:56 -08:00
Tri Dao
1a2c3e8c25 Bump to v2.4.2 2023-12-25 16:28:57 -08:00
Tri Dao
73df3be7d5 Add test for BTLM init 2023-12-25 15:16:27 -08:00
Tri Dao
7ffba9a501 Implement BTLM model 2023-12-24 20:35:12 -08:00
Tri Dao
2e29dacf0c Implement muParam 2023-12-24 20:34:48 -08:00
Tri Dao
3f7d5786ba Pass alibi slopes to flash_attn_with_kvcache during generation 2023-12-24 20:31:59 -08:00
Tri Dao
f844852485 Bump to v2.4.1 2023-12-23 21:00:39 -08:00
Tri Dao
0842ec0da4 Don't dispatch to local if window size >= seqlen_k 2023-12-23 20:59:26 -08:00
Tri Dao
732654583c Implement deterministic backward (thanks to Meituan) 2023-12-23 17:57:36 -08:00
Tri Dao
2c7d7b7396 Implement norm head for Baichuan2 2023-12-22 16:55:40 -08:00
Tri Dao
68f178aa4b [CI] Don't compile for python 3.7 pytorch 2.2 2023-12-22 10:10:02 -08:00
Tri Dao
7316277303 Bump to v2.4.0 2023-12-22 00:09:53 -08:00
Tri Dao
50d144c906 Mention Alibi in README 2023-12-21 23:48:16 -08:00
Tri Dao
8448c02889 Update cutlass to v3.3.0 2023-12-21 23:25:50 -08:00
Tri Dao
c3b2196652 Add Alibi to MHA, test with Baichuan-13B 2023-12-21 22:49:55 -08:00
Tri Dao
701b51bfc3 [CI] Use torch-nightly 20231106 instead of 20231127 2023-12-21 22:28:09 -08:00
Tri Dao
5ab9b3667b Clean up alibi, implement non-causal alibi 2023-12-21 22:27:40 -08:00
Tri Dao
bc28eacc60 Format flash_attn_interface.py 2023-12-19 23:13:53 -08:00
Tri Dao
0a146185d6 [Gen] Remove minor dead code 2023-12-19 22:57:39 -08:00
Sanghun Cho
e4f726fc44
Support alibi, by Sanghun Cho from Kakao Brain
* hard-code alibi in fwd

* use params.h as hun_heads

* hard-code alibi in bwd

* add alibi on/off option

* compute alibi_start, ratio outside of kernels

* fix minor merge conflict

* add test_alibi.py

* change apply_alibi() location before masking

* add alibi in splitkv kernel

* fix backward func # of returns

* add out-of-bound check in apply_alibi()

* update test_alibi.py

* update test_alibi.py for kvcache

* simplify alibi parameter interface

* fix performance issue
by computing alibi outside of branch

* update test_flash_attn_varlen_func() for left padding

* implement alibi_slopes (b, nh) loading

* optimize apply_alibi() a bit

* update test cases for alibi_slopes loading

* reflect stylistic comments

* disable "seqlenq_ngroups_swapped" when using alibi

---------

Co-authored-by: monk.detective <monk.detective@kakaobrain.com>
2023-12-19 22:56:06 -08:00
Tri Dao
cd089597fd [LayerNorm] Implement dropout in fused residual + LN/RMSNorm 2023-12-19 16:26:07 -08:00
Tri Dao
713bd3aa9a [CrossEntropy] Test longer sequences 2023-12-16 19:11:23 -08:00
Tri Dao
08124c8f9c [CrossEntropy] Implement logit_scale option 2023-12-16 18:39:37 -08:00
Tri Dao
9356a1c038 [LayerNorm] Implement layer_norm_linear 2023-11-30 21:46:07 -08:00
Tri Dao
92dd5703ec Bump to v2.3.6 2023-11-27 16:23:39 -08:00
Tri Dao
d4a7c8ffbb [CI] Only compile for CUDA 11.8 & 12.2, MAX_JOBS=2,add torch-nightly 2023-11-27 16:21:28 -08:00
Jeremy Reizenstein
ce3e7280f8
Allow varlen_fwd to take optional seqused_k (#647)
Co-authored-by: bottler <bottler@users.noreply.github.com>
2023-11-27 00:41:23 -08:00
Tri Dao
23b77c8148 Bump to v2.3.5 2023-11-26 19:08:28 -08:00
Tri Dao
b4bf9cc1f3 Fix performance regression with causal 2023-11-26 19:07:25 -08:00