Haicheng Wu
|
4e8af93da1
|
Merge remote-tracking branch 'origin/master' into small_alignment
|
2021-09-07 20:39:38 +00:00 |
|
Manish Gupta
|
6c2f8f2fb8
|
CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning
* cutlass 2.6 update
* remove debug prints
* cutlass 2.6.1 (minor update)
* Updated CHANGELOG.
* Minor edit to readme to indicate patch version.
* Minor edit to readme.
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
|
2021-09-03 10:26:15 -07:00 |
|
Haicheng Wu
|
598e35401c
|
Merge remote-tracking branch 'origin/master' into small_alignment
|
2021-08-16 07:49:08 -07:00 |
|
Manish Gupta
|
1ac4559d12
|
Cutlass 2.6 Update 1 (#301)
* cutlass 2.6 update
* remove debug prints
|
2021-07-27 17:58:30 -07:00 |
|
Manish Gupta
|
e5d51840e8
|
CUTLASS 2.6 (#298)
CUTLASS 2.6
|
2021-07-23 00:40:53 -04:00 |
|
mengchi.hmc
|
f4b0a33633
|
add unit test for non int4 load
|
2021-04-23 14:33:46 +08:00 |
|
Manish Gupta
|
4cd004ead1
|
fix test name to optimized and instance large tile sizes to speed unit tests
|
2021-03-05 13:32:36 -08:00 |
|
Peter Han
|
6c4539e372
|
Make arch tag of test cases more precisely to SM60
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
|
2021-03-05 10:53:26 +08:00 |
|
Peter Han
|
a3639ab1a0
|
Append fp16 test case to verify Mma_HFMA2
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
|
2021-03-04 18:17:57 +08:00 |
|
Andrew Kerr
|
0e13748649
|
CUTLASS 2.5
|
2021-02-26 09:58:26 -05:00 |
|
Manish Gupta
|
6615010cd0
|
CUTLASS 2.4 (Implicit GEMM convolution) (#147)
CUTLASS 2.4 (Implicit GEMM Convolution)
Co-authored-by: Manish Gupta <manigupta@nvidia.com>, Haicheng Wu <haichengw@nvidia.com>, Dustyn Blasig <dblasig@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
|
2020-11-19 21:25:25 -08:00 |
|