Tri Dao
|
a8cfe51551
|
Implement Tensor Parallel for transformer Block
|
2022-12-25 14:08:21 -08:00 |
|
Tri Dao
|
5db330519a
|
[LayerNorm] Support taking subset of input or subset of output
|
2022-12-12 22:16:14 -08:00 |
|
Tri Dao
|
ae137ed17a
|
[LayerNorm] Fuse LayerScale
|
2022-12-10 23:28:23 -08:00 |
|
Tri Dao
|
8c6609ae1a
|
[LayerNorm] Support all dimensions up to 6k (if divisible by 8)
|
2022-12-09 02:06:22 -08:00 |
|
Tri Dao
|
0bf5e50038
|
Release training code
|
2022-11-28 17:34:40 -08:00 |
|
Tri Dao
|
39ed597b28
|
[LayerNorm] Compile for both sm70 and sm80
|
2022-11-17 11:45:11 -08:00 |
|
Tri Dao
|
43ab0b5205
|
Mention that some CUDA extensions have only been tested on A100s
|
2022-11-15 07:10:25 -08:00 |
|
Tri Dao
|
e4d3013e15
|
[LayerNorm] Check cuda error after querying ctas_per_sm
|
2022-11-15 07:05:13 -08:00 |
|
Tri Dao
|
2e33fc8e36
|
Add GPT and ViT models
|
2022-11-13 22:30:23 -08:00 |
|
Tri Dao
|
fa6d1ce44f
|
Add fused_dense and dropout_add_layernorm CUDA extensions
|
2022-11-13 21:59:20 -08:00 |
|