Lei Mao
e5f3caf145
Fix README ( #1658 )
...
* Fix README
* Improve README
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2024-10-23 12:52:43 -04:00
Yujia Zhai
cc3c29a81a
CUTLASS 3.6.0 ( #1850 )
...
* v3.6
* update changelog
* update readme
* fix typo
* fixing typos
* hopper gemm with weight prefetch
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-10-09 15:33:27 -04:00
Haicheng Wu
8d8cfdf375
update 3.5.1 readme/changelog
2024-08-14 21:12:44 -07:00
dePaul Miller
4e5a8f6853
3.5.1 plots and updated readme ( #1708 )
...
Co-authored-by: dePaul Miller <23461061+depaulmillz@users.noreply.github.com>
2024-08-12 18:55:55 -04:00
Vijay Thakkar
be60a0b272
CUTLASS 3.5.1 ( #1623 )
...
* CUTLASS 3.5.1
* updates, optimizations, fixes
2024-07-29 08:46:24 -04:00
Vijay Thakkar
7d49e6c7e2
Updates for CUTLASS 3.5.0 ( #1468 )
2024-04-11 21:33:40 -04:00
Vijay Thakkar
629f4653c3
CUTLASS 3.5.0 ( #1411 )
2024-03-19 17:51:04 -04:00
ANIKET SHIVAM
bbe579a9e3
Updates for CUTLASS 3.4.1 ( #1346 )
...
* Updates for CUTLASS 3.4.1
* minor epi change
2024-02-15 15:48:34 -05:00
ANIKET SHIVAM
751eb9a885
Update license year ( #1306 )
2024-01-16 14:37:22 -05:00
ANIKET SHIVAM
2f589ffa76
Updates for 3.4 release. ( #1305 )
2024-01-16 13:42:51 -05:00
Pradeep Ramani
8236f30675
CUTLASS 3.4.0 ( #1286 )
...
* CUTLASS 3.4.0
* Update CHANGELOG.md
---------
Co-authored-by: Pradeep Ramani <prramani@nvidia.com>
2023-12-29 15:21:31 -05:00
Pradeep Ramani
e9e30c2304
Updates and Bug fixes to CUTLASS 3.3 ( #1232 )
2023-12-05 09:50:49 -05:00
Manish Gupta
5ae8133cfa
Doc only change changelog 3.3 ( #1180 )
2023-11-13 13:29:22 -05:00
Pradeep Ramani
c008b4aea8
CUTLASS 3.3.0 ( #1167 )
...
* Release 3.3.0
Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
* minor doc update
2023-11-02 11:09:05 -04:00
ANIKET SHIVAM
90d3b0fb18
CUTLASS 3.2.1 ( #1113 )
...
* Updates for 3.2.1 release.
* Minor fix in gemm op profiler for raster order.
* Add scheduler mapping for raster order in the kernels.
2023-09-26 17:24:26 -04:00
ANIKET SHIVAM
4575443d44
CUTLASS 3.2 ( #1024 )
...
* CUTLASS 3.2
2023-08-07 20:50:32 -04:00
Vijay Thakkar
fde824af21
Update Hopper performance plot for CUTLASS 3.1 + CTK 12.1 ( #967 )
2023-06-01 14:52:40 -04:00
Haicheng Wu
6f47420213
Update README.md
2023-05-24 12:40:31 -04:00
ANIKET SHIVAM
f079619f5e
More updates for 3.1 ( #958 )
...
* Updates for 3.1
* Minor change
* doc link fix
* Minor updates
2023-05-24 10:17:16 -04:00
ANIKET SHIVAM
d572cc1aab
CUTLASS 3.1 ( #915 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-04-14 23:19:34 -04:00
Alexander Pivovarov
7e370c9637
Fix typos 2 ( #842 )
...
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2023-03-09 23:22:56 -05:00
ANIKET SHIVAM
c4f6b8c6bc
Updates for 3.0 ( #857 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-03-09 15:27:40 -05:00
Vijay Thakkar
277bd6e537
CUTLASS 3.0.0 ( #786 )
...
* CUTLASS 3.0.0
2023-01-23 20:55:28 -05:00
ANIKET SHIVAM
66d9cddc83
New updates for 2.11 ( #775 )
...
* New updates.
* Minor profiler updates
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-01-20 16:32:57 -05:00
Haicheng Wu
764b840d6f
streamk example and performance tuning ( #760 )
...
* streamk example and performance tuning
* one missing file
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-01-10 16:10:02 -05:00
Matthew Nicely
78b30d3191
Update README.md
2022-12-21 11:58:19 -05:00
Matthew Nicely
59de82688b
Update README.md
2022-12-21 11:57:55 -05:00
Haicheng Wu
9f1f37aa21
misc ( #719 )
...
* misc
* minor
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-12-05 12:07:20 -05:00
Aditya Atluri
c975e2ccbb
releaase 2.11 ( #703 )
2022-11-19 09:02:15 -05:00
ANIKET SHIVAM
50ceed7154
Minor README fix ( #623 )
...
* minor fix
* Minor fix
2022-09-12 22:40:25 -04:00
ANIKET SHIVAM
e773429f7e
CUTLASS 2.10 updates ( #622 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-09-12 21:26:30 -04:00
Yujia Zhai
beae168f90
fix broken link ( #620 )
...
Co-authored-by: yuzhai <yuzhai@nvidia.com>
2022-09-06 16:32:44 -04:00
ANIKET SHIVAM
b72cbf957d
CUTLASS 2.10 ( #615 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-09-03 18:48:46 -04:00
Haicheng Wu
ba18ea9c32
Update README.md
2022-06-27 23:25:26 -04:00
Haicheng Wu
cc2ea4c3fc
Update README.md
2022-04-28 10:50:11 -04:00
Jack Kosaian
167ac54c65
Fix link to Python example ( #469 )
2022-04-23 15:37:38 -04:00
Andrew Kerr
12f4108ac2
CUTLASS 2.9 ( #468 )
2022-04-23 15:02:38 -04:00
Andrew Kerr
4e666e1dfd
Updated README and added issue templates. ( #382 )
2021-12-17 09:26:20 -05:00
Andrew Kerr
5fe09c2d67
Updated GEMM performance plot with CUTLASS 2.8 compiled with CUDA 11.5 Toolkit ( #375 )
...
Updated GEMM performance plot with CUTLASS 2.8 compiled using CUDA 11.5 Toolkit.
GPUs under test:
NVIDIA A100
NVIDIA A2
NVIDIA TitanV
NVIDIA GeForce 2080 Ti
2021-12-06 14:21:33 -05:00
Andrew Kerr
62e438f450
Listed Matthew Nicely as the CUTLASS product manager.. ( #364 )
2021-11-19 17:51:21 -08:00
Manish Gupta
808c25337a
CUTLASS 2.8 ( #363 )
...
CUTLASS 2.8
2021-11-19 13:26:35 -08:00
Manish Gupta
2e07c4cc2f
CUTLASS 2.7 ( #318 )
...
CUTLASS 2.7
Mainloop fusion for GEMM: summation over A or B
Strided DGRAD (optimized iterators)
Half-precision GELU_taylor activation functions
Use these when accumulation and epilogue compute types are all cutlass::half_t
Tuning and bug fixes to fused GEMM + GEMM example
Support for smaller than 128b aligned Convolutions: see examples
Caching of results to accelerate Convolution unit tests
Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
Corrections and bug fixes reported by the CUTLASS community
Thank you for filing these issues!
authored-by: Haicheng Wu haichengw@nvidia.com , Manish Gupta manigupta@nvidia.com , Dustyn Blasig dblasig@nvidia.com , Andrew Kerr akerr@nvidia.com
2021-09-20 11:02:22 -07:00
Manish Gupta
6c2f8f2fb8
CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning
...
* cutlass 2.6 update
* remove debug prints
* cutlass 2.6.1 (minor update)
* Updated CHANGELOG.
* Minor edit to readme to indicate patch version.
* Minor edit to readme.
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
2021-09-03 10:26:15 -07:00
Manish Gupta
1ac4559d12
Cutlass 2.6 Update 1 ( #301 )
...
* cutlass 2.6 update
* remove debug prints
2021-07-27 17:58:30 -07:00
Manish Gupta
e5d51840e8
CUTLASS 2.6 ( #298 )
...
CUTLASS 2.6
2021-07-23 00:40:53 -04:00
Andrew Kerr
200a5a5146
Enabled reduction unit tests.
2021-02-26 15:46:57 -05:00
Andrew Kerr
abdf16a4d9
Updated release notes.
2021-02-26 13:55:04 -05:00
Andrew Kerr
0e13748649
CUTLASS 2.5
2021-02-26 09:58:26 -05:00
Manish Gupta
ccb697bac7
cutlass 2.4 documentation only update
2020-11-23 06:59:45 -06:00
Manish Gupta
6615010cd0
CUTLASS 2.4 (Implicit GEMM convolution) ( #147 )
...
CUTLASS 2.4 (Implicit GEMM Convolution)
Co-authored-by: Manish Gupta <manigupta@nvidia.com>, Haicheng Wu <haichengw@nvidia.com>, Dustyn Blasig <dblasig@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
2020-11-19 21:25:25 -08:00