Haicheng Wu
6023038bae
add verification of the reduction tensor ( #489 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-05-06 10:24:51 -07:00
Andrew Kerr
12f4108ac2
CUTLASS 2.9 ( #468 )
2022-04-23 15:02:38 -04:00
Andrew Kerr
8a951b2940
Enable convolution with fused epilogue for Volta Tensor Cores ( #402 )
...
* Enabled convolution with epilogue fusion for Volta Tensor Cores.
* Compilation fixes
* Disabled testing Volta on Ampere architectures.
2022-01-30 23:24:50 -05:00
masahi
c2ee13a0fe
Add epilogue functor for residual block fusion ( #391 )
...
* Add epilogue functor for residual block fusion
* Do not run split-k tests when ActivationOp is not Identity
* explain TestSplitK param
* return early
2021-12-29 22:53:40 -05:00
Andrew Kerr
ec4f7e5194
Updates to fused epilogue ( #383 )
...
* Enhancements and fixes to fused GEMM and Convolution epilogue.
* Need to explicitly list cudart as unit test library dependency.
2021-12-17 16:04:43 -05:00
Manish Gupta
2e07c4cc2f
CUTLASS 2.7 ( #318 )
...
CUTLASS 2.7
Mainloop fusion for GEMM: summation over A or B
Strided DGRAD (optimized iterators)
Half-precision GELU_taylor activation functions
Use these when accumulation and epilogue compute types are all cutlass::half_t
Tuning and bug fixes to fused GEMM + GEMM example
Support for smaller than 128b aligned Convolutions: see examples
Caching of results to accelerate Convolution unit tests
Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
Corrections and bug fixes reported by the CUTLASS community
Thank you for filing these issues!
authored-by: Haicheng Wu haichengw@nvidia.com , Manish Gupta manigupta@nvidia.com , Dustyn Blasig dblasig@nvidia.com , Andrew Kerr akerr@nvidia.com
2021-09-20 11:02:22 -07:00
Haicheng Wu
59e2aa505a
refine the implementation
2021-09-08 13:14:08 +00:00
Haicheng Wu
4e8af93da1
Merge remote-tracking branch 'origin/master' into small_alignment
2021-09-07 20:39:38 +00:00
Manish Gupta
6c2f8f2fb8
CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning
...
* cutlass 2.6 update
* remove debug prints
* cutlass 2.6.1 (minor update)
* Updated CHANGELOG.
* Minor edit to readme to indicate patch version.
* Minor edit to readme.
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
2021-09-03 10:26:15 -07:00
Haicheng Wu
598e35401c
Merge remote-tracking branch 'origin/master' into small_alignment
2021-08-16 07:49:08 -07:00
Manish Gupta
1ac4559d12
Cutlass 2.6 Update 1 ( #301 )
...
* cutlass 2.6 update
* remove debug prints
2021-07-27 17:58:30 -07:00
Manish Gupta
e5d51840e8
CUTLASS 2.6 ( #298 )
...
CUTLASS 2.6
2021-07-23 00:40:53 -04:00
mengchi.hmc
f4b0a33633
add unit test for non int4 load
2021-04-23 14:33:46 +08:00
Manish Gupta
4cd004ead1
fix test name to optimized and instance large tile sizes to speed unit tests
2021-03-05 13:32:36 -08:00
Peter Han
6c4539e372
Make arch tag of test cases more precisely to SM60
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-05 10:53:26 +08:00
Peter Han
a3639ab1a0
Append fp16 test case to verify Mma_HFMA2
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-04 18:17:57 +08:00
Andrew Kerr
0e13748649
CUTLASS 2.5
2021-02-26 09:58:26 -05:00
Manish Gupta
6615010cd0
CUTLASS 2.4 (Implicit GEMM convolution) ( #147 )
...
CUTLASS 2.4 (Implicit GEMM Convolution)
Co-authored-by: Manish Gupta <manigupta@nvidia.com>, Haicheng Wu <haichengw@nvidia.com>, Dustyn Blasig <dblasig@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
2020-11-19 21:25:25 -08:00