Tri Dao
|
43617deab9
|
Remove template for (IsEvenMN=T, IsEvenK=F) to speed up compilation
|
2023-09-18 12:21:36 -07:00 |
|
Tri Dao
|
799f56fa90
|
Don't compile for Pytorch 2.1 on CUDA 12.1 due to nvcc segfaults
|
2023-09-17 22:15:38 -07:00 |
|
Tri Dao
|
c984208ddb
|
Set block size to 64 x 64 for kvcache to avoid nvcc segfaults
|
2023-09-17 16:14:58 -07:00 |
|
Tri Dao
|
8c8b4d36e1
|
Bump to v2.2.3
|
2023-09-16 01:47:01 -07:00 |
|
Tri Dao
|
08c295c043
|
Bump to v2.2.2
|
2023-09-10 23:48:12 -07:00 |
|
Tri Dao
|
a1576ad1e8
|
Bump to v2.2.1
|
2023-09-06 02:19:55 -07:00 |
|
Tri Dao
|
6d673cd961
|
Bump to v2.2.0
|
2023-09-05 11:34:13 -07:00 |
|
Tri Dao
|
4976650f74
|
Set single threaded compilation for CUDA 12.2 so CI doesn't OOM
|
2023-09-03 23:42:55 -07:00 |
|
Tri Dao
|
6a89b2f121
|
Remove constexpr in launch template to fix CI compilation
|
2023-09-03 22:59:41 -07:00 |
|
Tri Dao
|
97ba7a62e9
|
Try switching back to Cutlass 3.2.0
|
2023-09-03 22:45:35 -07:00 |
|
Tri Dao
|
1dc1b6c8f2
|
Bump to v2.1.2
|
2023-09-03 22:23:05 -07:00 |
|
Tri Dao
|
757058d4d3
|
Update Cutlass to v3.2.0
|
2023-08-27 23:47:28 -07:00 |
|
Tri Dao
|
9e5e8bc91e
|
Change causal mask to be aligned to bottom-right instead of top-left
|
2023-08-24 23:41:07 -07:00 |
|
Tri Dao
|
6711b3bc40
|
Bump version to 2.0.9
|
2023-08-22 00:21:14 -07:00 |
|
Tri Dao
|
c5e87b11e9
|
Bump to v2.0.5
|
2023-08-13 13:55:04 -07:00 |
|
Tri Dao
|
d30f2e1cd5
|
Bump to v2.0.4
|
2023-08-01 09:01:07 -07:00 |
|
Tri Dao
|
a4e5d1eddd
|
Bump to v2.0.3
|
2023-07-31 17:49:23 -07:00 |
|
Kirthi Shankar Sivamani
|
32a953f486
|
Request for v2.0.2 (#388)
* Bump version to 2.0.2
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
* Update version in Dockerfile
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
---------
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
|
2023-07-28 02:46:03 -07:00 |
|
Tri Dao
|
b252072409
|
Bump to v2.0.1
|
2023-07-23 12:33:42 -10:00 |
|
chuanli11
|
30fd8c17d8
|
remove checkout v2.0.0.post1 from dockerfile
|
2023-07-20 16:40:15 +00:00 |
|
Tri Dao
|
4f285b3547
|
FlashAttention-2 release
|
2023-07-17 06:21:34 -07:00 |
|
Tri Dao
|
6d48e14a6c
|
Bump to v1.0.9
|
2023-07-17 03:16:40 -07:00 |
|
Tri Dao
|
9610114ce8
|
Bump to v1.0.8
|
2023-07-02 17:04:54 -07:00 |
|
Tri Dao
|
85b51d61ee
|
Bump version to 1.0.7
|
2023-05-30 14:18:44 -07:00 |
|
Kirthi Shankar Sivamani
|
dd9c3a1fc2
|
bump to v1.0.6
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
|
2023-05-26 17:44:10 -07:00 |
|
Tri Dao
|
eff9fe6b80
|
Add ninja to pyproject.toml build-system, bump to v1.0.5
|
2023-05-12 14:20:31 -07:00 |
|
Tri Dao
|
ad113948a6
|
[Docs] Clearer error message for bwd d > 64, bump to v1.0.4
|
2023-04-26 09:19:48 -07:00 |
|
Tri Dao
|
fbbb107848
|
Bump version to v1.0.3.post0
|
2023-04-21 13:37:23 -07:00 |
|
Tri Dao
|
67ef5d28df
|
Bump version to 1.0.3
|
2023-04-21 12:04:53 -07:00 |
|
Tri Dao
|
df1344f866
|
Bump to v1.0.2
|
2023-04-15 22:19:31 -07:00 |
|
Tri Dao
|
853ff72963
|
Bump version to v1.0.1, fix Cutlass version
|
2023-04-12 10:05:01 -07:00 |
|
Tri Dao
|
74af023316
|
Bump version to 1.0.0
|
2023-04-11 23:32:35 -07:00 |
|
Tri Dao
|
009a3e71ec
|
[Training] Fix lightning _PATH import
|
2023-03-29 01:43:39 -07:00 |
|
Ikko Eltociear Ashimine
|
419ea45b64
|
fix typo in default.yaml
additionaly -> additionally
|
2023-01-21 00:47:12 +09:00 |
|
Tri Dao
|
33e0860c9c
|
Bump to v0.2.8
|
2023-01-19 13:17:19 -08:00 |
|
Tri Dao
|
88173a1aaf
|
[FusedDense] Support relu, rename FusedDenseGeluDense -> FusedMLP
|
2023-01-17 18:12:27 -08:00 |
|
Tri Dao
|
ce26d3d73d
|
Bump to v0.2.7
|
2023-01-06 17:37:30 -08:00 |
|
Tri Dao
|
71befc19e1
|
[Loss] Use flash_attn.losses.cross_entropy.CrossEntropyLoss
|
2022-12-31 22:43:28 -08:00 |
|
Tri Dao
|
cadfa396b8
|
[Docker] Set torchmetrics==0.10.3
|
2022-12-30 02:42:28 -08:00 |
|
Tri Dao
|
43798966cf
|
[Docs] Fix formatting
|
2022-12-30 00:01:55 -08:00 |
|
Tri Dao
|
3c7cbfc195
|
[Docs] Mention that dropout_layer_norm supports all dims up to 6k
|
2022-12-29 23:55:33 -08:00 |
|
Tri Dao
|
984d5204e2
|
Update training Dockerfile to use flash-attn==0.2.6
|
2022-12-29 15:12:33 -08:00 |
|
Tri Dao
|
b4018a5028
|
Implement Tensor Parallel for GPT model
|
2022-12-26 16:22:43 -08:00 |
|
Tri Dao
|
dff68c2b22
|
Add smoothing for CrossEntropyParallel, rename to CrossEntropyLoss
|
2022-12-23 14:51:08 -08:00 |
|
Tri Dao
|
c2407dec96
|
Fix typo in config: train.gpu -> train.gpu_mem
|
2022-12-21 13:42:30 -08:00 |
|
Tri Dao
|
4a6eaa9f27
|
Update configs, add results
|
2022-11-29 04:46:43 -08:00 |
|
Tri Dao
|
0bf5e50038
|
Release training code
|
2022-11-28 17:34:40 -08:00 |
|