ANIKET SHIVAM
751eb9a885
Update license year ( #1306 )
2024-01-16 14:37:22 -05:00
ANIKET SHIVAM
d572cc1aab
CUTLASS 3.1 ( #915 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-04-14 23:19:34 -04:00
Edward Rees
86cae03cea
expose StoreT parameter for potential speed ( #838 )
...
* expose StoreT parameter for potential speed
* add storeT to more elementwise
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-03-10 12:58:17 -05:00
Shuai Shao
ce8597dc14
Fix type bug in conv2d/gemm with broadcast ( #796 )
...
add ElementVector
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-02-09 20:53:25 -05:00
ANIKET SHIVAM
66d9cddc83
New updates for 2.11 ( #775 )
...
* New updates.
* Minor profiler updates
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-01-20 16:32:57 -05:00
Aditya Atluri
c975e2ccbb
releaase 2.11 ( #703 )
2022-11-19 09:02:15 -05:00
Andrew Kerr
12f4108ac2
CUTLASS 2.9 ( #468 )
2022-04-23 15:02:38 -04:00
masahi
c2ee13a0fe
Add epilogue functor for residual block fusion ( #391 )
...
* Add epilogue functor for residual block fusion
* Do not run split-k tests when ActivationOp is not Identity
* explain TestSplitK param
* return early
2021-12-29 22:53:40 -05:00