Chirag Jain
|
50896ec574
|
Make nvcc threads configurable via environment variable (#885)
|
2024-03-13 20:46:57 -07:00 |
|
Tri Dao
|
abbc131173
|
[LayerNorm] Switch from CUDA to Triton implementation
|
2024-01-05 00:31:17 -08:00 |
|
Joel Lamy-Poirier
|
767b71ccf0
|
Fix random state for dropout_layer_norm (#315)
|
2023-07-23 15:05:13 -07:00 |
|
Ikko Eltociear Ashimine
|
dfc60f6b7d
|
[LayerNorm] Fix typo in ln_api.cpp
unintialized -> uninitialized
|
2023-07-20 01:16:16 +09:00 |
|
Tri Dao
|
393882bc08
|
[LayerNorm] Implement LN with parallel residual, support dim 8k
|
2023-03-31 14:23:45 -07:00 |
|
Tri Dao
|
dc08ea1c33
|
Support H100 for other CUDA extensions
|
2023-03-15 16:59:27 -07:00 |
|
Tri Dao
|
eb33e587e9
|
[LayerNorm] Rename x1 -> residual
|
2023-01-19 13:07:27 -08:00 |
|
Tri Dao
|
6738d9477d
|
[LayerNorm] Implement RMS Norm
|
2023-01-06 17:34:22 -08:00 |
|
Tri Dao
|
a8cfe51551
|
Implement Tensor Parallel for transformer Block
|
2022-12-25 14:08:21 -08:00 |
|
Tri Dao
|
5db330519a
|
[LayerNorm] Support taking subset of input or subset of output
|
2022-12-12 22:16:14 -08:00 |
|
Tri Dao
|
ae137ed17a
|
[LayerNorm] Fuse LayerScale
|
2022-12-10 23:28:23 -08:00 |
|
Tri Dao
|
8c6609ae1a
|
[LayerNorm] Support all dimensions up to 6k (if divisible by 8)
|
2022-12-09 02:06:22 -08:00 |
|
Tri Dao
|
0bf5e50038
|
Release training code
|
2022-11-28 17:34:40 -08:00 |
|
Tri Dao
|
39ed597b28
|
[LayerNorm] Compile for both sm70 and sm80
|
2022-11-17 11:45:11 -08:00 |
|
Tri Dao
|
43ab0b5205
|
Mention that some CUDA extensions have only been tested on A100s
|
2022-11-15 07:10:25 -08:00 |
|
Tri Dao
|
e4d3013e15
|
[LayerNorm] Check cuda error after querying ctas_per_sm
|
2022-11-15 07:05:13 -08:00 |
|
Tri Dao
|
2e33fc8e36
|
Add GPT and ViT models
|
2022-11-13 22:30:23 -08:00 |
|
Tri Dao
|
fa6d1ce44f
|
Add fused_dense and dropout_add_layernorm CUDA extensions
|
2022-11-13 21:59:20 -08:00 |
|