Commit Graph

81 Commits

Author SHA1 Message Date
Tri Dao
a4f148b6ab Fix masking of bwd when seqlen is not divisible by 128 2023-07-31 17:46:34 -07:00
Tri Dao
184b992dcb [GPT] Implement parallel LLaMa 2023-07-28 15:52:48 -10:00
Haodong Lyu
8ee62efca3
Implement ParallelGatedMlp (#251) 2023-07-26 12:14:15 -07:00
Tri Dao
56ccaff126 [GPT] Add LLaMa-13B to test 2023-07-26 07:22:22 -10:00
Tri Dao
8e9820a55b [Rotary] Fix tests when loading state dict with rotary inv_freqs 2023-07-26 07:16:33 -10:00
Tri Dao
2a2a3c4bfd [LayerNorm] Add test for randomness 2023-07-23 12:31:55 -10:00
Tri Dao
d38357dd2f [GPT] Implement Falcon 2023-07-23 10:32:29 -07:00
Tri Dao
425dbcb6c6 [MHA] Implement MQA/GQA 2023-07-23 00:06:58 -07:00
Tri Dao
b3177dfaf6 [GPT] Enable FlashAttention for GPT-J 2023-07-21 17:29:10 -07:00
Tri Dao
4f285b3547 FlashAttention-2 release 2023-07-17 06:21:34 -07:00
Tri Dao
d2f4324f4c [LayerNorm] Make sure memory addresses are aligned to 16 bytes 2023-07-04 14:53:12 -07:00
Tri Dao
62e9814466 [Rotary] Make sure frequency calculation is in fp32 2023-07-02 16:39:39 -07:00
Tri Dao
48bc6eacd6 [Gen] Add rotary base as an argument to FT attention kernel 2023-05-30 13:38:34 -07:00
Tri Dao
a9a4b4e4f2 [LLaMa] Fix last norm layer to use RMSNorm instead of LayerNorm 2023-05-04 23:39:43 -07:00
Tri Dao
311d6606bf [Gen] Fix FT kernel smem size, CG when batch size changed 2023-04-20 17:03:13 -07:00
Tri Dao
96d10f6545 Implement LLaMa 2023-04-18 21:51:35 -07:00
Tri Dao
605655bc66 [Gen] Fix FT kernel when using CG 2023-04-14 16:50:01 -07:00
Tri Dao
393882bc08 [LayerNorm] Implement LN with parallel residual, support dim 8k 2023-03-31 14:23:45 -07:00
Tri Dao
993d12448e Implement GPT-NeoX 2023-03-29 01:21:25 -07:00
Tri Dao
f5d0fbd468 [FT] Fix FT's single query attention for bf16 hdim128 rotary 2023-03-28 21:27:00 -07:00
Tri Dao
4d87e4d875 Implement GPT-J 2023-03-22 16:16:58 -07:00
Tri Dao
e45a46a5b7 [Rotary] Implement GPT-J style (interleaved) rotary 2023-03-14 14:35:53 -07:00
Tri Dao
78b7a1dc18 [OPT] Load fp16 weights on CPU before moving to GPU 2023-01-22 17:01:32 -08:00
Tri Dao
f68d41ec77 [Gen] Add OPT to generation test 2023-01-17 19:59:06 -08:00
Tri Dao
88173a1aaf [FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP 2023-01-17 18:12:27 -08:00
Tri Dao
780e8eeabb [ViT] Support timm checkpoint, add tests 2023-01-16 01:20:34 -08:00
Tri Dao
ff34123bd4 Reorder LN in Block, support OPT 2023-01-15 22:14:31 -08:00
Tri Dao
f1e01c27ba [Gen] Pass qkv_stride to ft_attention kernel for batched generation 2023-01-15 15:20:01 -08:00
Tri Dao
7c2191542a [Gen] Make generation work with Tensor Parallel 2023-01-15 11:34:27 -08:00
Tri Dao
b48599002a [Gen] Add timing option 2023-01-07 19:05:09 -08:00
Tri Dao
0938298e4c [Gen] Adjust shape of kv_cache when using FT 2023-01-07 17:27:54 -08:00
Tri Dao
e02fd588aa [Gen] Implement top-k and top-p sampling 2023-01-07 17:00:02 -08:00
Tri Dao
11be742aa3 [Gen] Test generation with rotary embedding 2023-01-07 14:37:54 -08:00
Tri Dao
93383bd55b [TP] Implement TensorParallel without sequence parallel 2023-01-07 13:45:22 -08:00
Tri Dao
6738d9477d [LayerNorm] Implement RMS Norm 2023-01-06 17:34:22 -08:00
Tri Dao
a668890fcd [Gen] Add option to run generation with FT attention kernel 2023-01-03 22:10:31 -08:00
Tri Dao
ef1ba918c6 [GPT] Refactor function to shard state_dict for TensorParallel 2023-01-01 00:09:33 -08:00
Tri Dao
63670fd84a Implement generation for GPT 2022-12-27 21:01:50 -08:00
Tri Dao
9d797d8848 Support loading GPT2 weights from Huggingface 2022-12-27 11:22:48 -08:00
Tri Dao
c6ecd40a59 Tweak CrossEntropyLoss to take process_group in init 2022-12-27 10:47:43 -08:00
Tri Dao
b4018a5028 Implement Tensor Parallel for GPT model 2022-12-26 16:22:43 -08:00
Tri Dao
78225c5366 Implement Tensor Parallel for GPT2Embeddings 2022-12-25 14:29:53 -08:00
Tri Dao
a8cfe51551 Implement Tensor Parallel for transformer Block 2022-12-25 14:08:21 -08:00
Tri Dao
1e712ea8b0 Implement TensorParallel for MHA 2022-12-25 11:39:55 -08:00
Tri Dao
226a1b721d Implement TensorParallel for FusedDense and FusedDenseGeluDense 2022-12-24 11:48:56 -08:00
Tri Dao
dff68c2b22 Add smoothing for CrossEntropyParallel, rename to CrossEntropyLoss 2022-12-23 14:51:08 -08:00
Tri Dao
e68ebbe89a Simplify FusedDense 2022-12-22 21:25:31 -08:00
Tri Dao
13cdceb377 Implement last_layer_subset optimization for BERT 2022-12-19 22:18:46 -08:00
Tri Dao
5fb6df0e04 Implement BERT 2022-12-18 21:47:27 -08:00
Tri Dao
5db330519a [LayerNorm] Support taking subset of input or subset of output 2022-12-12 22:16:14 -08:00