Tri Dao
|
3557e0bb8f
|
[MLP] Implement SwiGLU with torch jiterator
|
2023-09-04 15:43:53 -07:00 |
|
Tri Dao
|
f1a73d0740
|
Run isort and black on python files
|
2023-08-18 14:22:11 -07:00 |
|
Tri Dao
|
364a5b4a71
|
[MLP] Change the check for out_features being None
|
2023-08-10 00:04:38 -07:00 |
|
Tri Dao
|
4c98d0b41f
|
[MLP] Edit ParallelGatedMlp
|
2023-07-26 09:39:37 -10:00 |
|
Haodong Lyu
|
8ee62efca3
|
Implement ParallelGatedMlp (#251)
|
2023-07-26 12:14:15 -07:00 |
|
Tri Dao
|
75e334d407
|
[MLP] Add ParallelMLP
|
2023-07-22 23:45:51 -07:00 |
|
Tri Dao
|
96d10f6545
|
Implement LLaMa
|
2023-04-18 21:51:35 -07:00 |
|
Tri Dao
|
b630aef53f
|
Implement GatedMlp
|
2023-04-18 03:37:14 -07:00 |
|
Zhiyuan Chen
|
8c42415664
|
make mlp hidden_features defaults to 4*in_features
|
2023-04-13 11:08:21 +08:00 |
|
Tri Dao
|
88173a1aaf
|
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
|
2023-01-17 18:12:27 -08:00 |
|
Tri Dao
|
226a1b721d
|
Implement TensorParallel for FusedDense and FusedDenseGeluDense
|
2022-12-24 11:48:56 -08:00 |
|
Tri Dao
|
e68ebbe89a
|
Simplify FusedDense
|
2022-12-22 21:25:31 -08:00 |
|
Tri Dao
|
13cdceb377
|
Implement last_layer_subset optimization for BERT
|
2022-12-19 22:18:46 -08:00 |
|
Tri Dao
|
1feb94265c
|
[ViT] Use dropout_add_ln for the 1st layer norm
|
2022-11-23 12:48:56 -08:00 |
|
Tri Dao
|
d4b320b31f
|
Add MLP, MHA, Block, Embedding modules
|
2022-11-13 22:06:44 -08:00 |
|