Sergey Klevtsov
d65266a868
Add all supported GMMA shapes ( #1890 )
2024-10-22 18:13:36 -04:00
Tri Dao
5b50a8faaf
Add GMMA shape m64n40k16 ( #1864 )
2024-10-21 20:41:47 -04:00
Yujia Zhai
cc3c29a81a
CUTLASS 3.6.0 ( #1850 )
...
* v3.6
* update changelog
* update readme
* fix typo
* fixing typos
* hopper gemm with weight prefetch
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-10-09 15:33:27 -04:00
Tri Dao
7192f4ab23
Add CLayout_64x208 ( #1680 )
...
Without this I get compilation error when the extended shapes are enabled
2024-08-08 14:00:24 -04:00
Sergey Klevtsov
36cbfcf483
Add extended wgmma shapes for all data types ( #1666 )
2024-07-31 18:33:14 -04:00
Tri Dao
5b283c872c
Add more GMMA shapes ( #1630 )
...
* Add more GMMA shapes
* Add more shapes for BF16
2024-07-29 19:09:51 -04:00
Vijay Thakkar
be60a0b272
CUTLASS 3.5.1 ( #1623 )
...
* CUTLASS 3.5.1
* updates, optimizations, fixes
2024-07-29 08:46:24 -04:00
Vijay Thakkar
7d49e6c7e2
Updates for CUTLASS 3.5.0 ( #1468 )
2024-04-11 21:33:40 -04:00
ANIKET SHIVAM
751eb9a885
Update license year ( #1306 )
2024-01-16 14:37:22 -05:00
ANIKET SHIVAM
2f589ffa76
Updates for 3.4 release. ( #1305 )
2024-01-16 13:42:51 -05:00
Pradeep Ramani
c008b4aea8
CUTLASS 3.3.0 ( #1167 )
...
* Release 3.3.0
Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
* minor doc update
2023-11-02 11:09:05 -04:00
ANIKET SHIVAM
4575443d44
CUTLASS 3.2 ( #1024 )
...
* CUTLASS 3.2
2023-08-07 20:50:32 -04:00
ANIKET SHIVAM
d572cc1aab
CUTLASS 3.1 ( #915 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-04-14 23:19:34 -04:00
Vijay Thakkar
277bd6e537
CUTLASS 3.0.0 ( #786 )
...
* CUTLASS 3.0.0
2023-01-23 20:55:28 -05:00