dePaul Miller
|
06b21349bc
|
1x1x1 cluster launch (#1673)
|
2024-08-01 12:20:28 -04:00 |
|
Vijay Thakkar
|
be60a0b272
|
CUTLASS 3.5.1 (#1623)
* CUTLASS 3.5.1
* updates, optimizations, fixes
|
2024-07-29 08:46:24 -04:00 |
|
Vijay Thakkar
|
7d49e6c7e2
|
Updates for CUTLASS 3.5.0 (#1468)
|
2024-04-11 21:33:40 -04:00 |
|
Vijay Thakkar
|
629f4653c3
|
CUTLASS 3.5.0 (#1411)
|
2024-03-19 17:51:04 -04:00 |
|
ANIKET SHIVAM
|
751eb9a885
|
Update license year (#1306)
|
2024-01-16 14:37:22 -05:00 |
|
ANIKET SHIVAM
|
4575443d44
|
CUTLASS 3.2 (#1024)
* CUTLASS 3.2
|
2023-08-07 20:50:32 -04:00 |
|
ANIKET SHIVAM
|
66d9cddc83
|
New updates for 2.11 (#775)
* New updates.
* Minor profiler updates
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
|
2023-01-20 16:32:57 -05:00 |
|
Aditya Atluri
|
c975e2ccbb
|
releaase 2.11 (#703)
|
2022-11-19 09:02:15 -05:00 |
|
ANIKET SHIVAM
|
b72cbf957d
|
CUTLASS 2.10 (#615)
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
|
2022-09-03 18:48:46 -04:00 |
|
Andrew Kerr
|
12f4108ac2
|
CUTLASS 2.9 (#468)
|
2022-04-23 15:02:38 -04:00 |
|
Manish Gupta
|
808c25337a
|
CUTLASS 2.8 (#363)
CUTLASS 2.8
|
2021-11-19 13:26:35 -08:00 |
|
Haicheng Wu
|
59e2aa505a
|
refine the implementation
|
2021-09-08 13:14:08 +00:00 |
|
Manish Gupta
|
6c2f8f2fb8
|
CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning
* cutlass 2.6 update
* remove debug prints
* cutlass 2.6.1 (minor update)
* Updated CHANGELOG.
* Minor edit to readme to indicate patch version.
* Minor edit to readme.
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
|
2021-09-03 10:26:15 -07:00 |
|
Manish Gupta
|
1ac4559d12
|
Cutlass 2.6 Update 1 (#301)
* cutlass 2.6 update
* remove debug prints
|
2021-07-27 17:58:30 -07:00 |
|
Manish Gupta
|
e5d51840e8
|
CUTLASS 2.6 (#298)
CUTLASS 2.6
|
2021-07-23 00:40:53 -04:00 |
|
Manikandan Ananth
|
75a4737cfe
|
Fix for public issue #211
- Add a slice-K tile size to the profiler
- fix num warps calculations in implicit gemm header
|
2021-04-01 14:42:00 -07:00 |
|
Peter Han
|
92393b2676
|
Bugfix: memsetAsync uses wrong default stream
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
|
2021-03-23 21:11:42 +08:00 |
|
Andrew Kerr
|
0e13748649
|
CUTLASS 2.5
|
2021-02-26 09:58:26 -05:00 |
|
Manish Gupta
|
6615010cd0
|
CUTLASS 2.4 (Implicit GEMM convolution) (#147)
CUTLASS 2.4 (Implicit GEMM Convolution)
Co-authored-by: Manish Gupta <manigupta@nvidia.com>, Haicheng Wu <haichengw@nvidia.com>, Dustyn Blasig <dblasig@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
|
2020-11-19 21:25:25 -08:00 |
|