Haicheng Wu
f58b843951
Merge pull request #239 from KeDengMS/kedeng/gelu
...
Fixes to Gelu for half and fusion
2021-05-08 12:51:42 -04:00
Haicheng Wu
5fc142296f
Merge pull request #237 from Peter9606/issue_236_typo
...
Typo fix issue#236
2021-05-08 07:51:19 -04:00
Haicheng Wu
233d69aa6d
Merge pull request #235 from Peter9606/issue_233_tranpose_update
...
tranpose.h update based on issue#233
2021-05-07 07:14:30 -04:00
Haicheng Wu
9840d25269
Merge pull request #256 from zheng95z/patch-2
...
Fixes some typos in utilities.md
2021-05-06 11:02:49 -04:00
Zheng Zeng
b878c96421
Fixes some typos in utilities.md
2021-05-06 22:37:37 +08:00
Haicheng Wu
8f8a80cad5
Merge pull request #251 from zheng95z/patch-1
...
add a missing 'device_memory::' before a function
2021-04-25 22:09:44 -04:00
Zheng Zeng
a8f6f8eb07
add a missing 'device_memory::' before a function
2021-04-25 20:05:39 +08:00
Haicheng Wu
7c783adf53
Merge pull request #247 from xue-fc/patch-1
...
fix a wrong description
2021-04-22 09:27:40 -04:00
xue-fc
4000df9567
fix a wrong description
2021-04-22 20:28:28 +08:00
KeDengMS
0b74c8f473
Address CR
2021-04-19 23:36:06 +00:00
KeDengMS
83036ed646
More clean up
2021-04-18 04:29:20 +00:00
KeDengMS
b7e43f5eb9
Clean up
2021-04-18 04:24:25 +00:00
KeDengMS
5c62d892fa
Add test
2021-04-18 04:09:34 +00:00
KeDengMS
41a31b404b
Fixes to Gelu for half and fusion
2021-04-17 22:10:19 +00:00
Peter Han
7320aee17d
Typo fix issue#236
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-04-15 15:08:35 +08:00
Peter Han
2142a05d9d
tranpose.h update based on issue#233
...
1. Add 'pragma once' preprocess directive
2. Replace prmt PTX with __byte_perm intrinsic
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-04-14 19:58:00 +08:00
Haicheng Wu
c77a524459
Merge pull request #230 from mani-ananth/master
...
Fix for issue #221
2021-04-09 14:45:55 -04:00
Manikandan Ananth
fac6680f31
Merge branch 'master' of github.com:NVIDIA/cutlass
2021-04-09 11:36:31 -07:00
Manikandan Ananth
08993707da
fixing functional bug in fused epilogue
2021-04-09 11:36:03 -07:00
Haicheng Wu
c805593ebe
Merge pull request #228 from mani-ananth/master
...
Fix for issue#224 and issue#225
2021-04-08 10:08:13 -04:00
Manikandan Ananth
26556d7206
fix a broken sparse gemm example. found by the community.
2021-04-07 13:32:55 -07:00
Manikandan Ananth
4839b6cb61
add 2stage fprop 3d into default file
2021-04-07 13:29:32 -07:00
Haicheng Wu
d97214987a
Merge pull request #220 from Peter9606/wrong-stride-array-definition
...
Bugfix: typo, make reduction device cases passed
2021-04-02 08:43:52 -04:00
Haicheng Wu
b0bbc6d548
Merge pull request #219 from mani-ananth/master
...
Fix for issue #211
2021-04-02 08:42:09 -04:00
Peter Han
7074047a54
Bugfix: typo, make reduction device cases passed
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-04-02 09:35:23 +08:00
Manikandan Ananth
75a4737cfe
Fix for public issue #211
...
- Add a slice-K tile size to the profiler
- fix num warps calculations in implicit gemm header
2021-04-01 14:42:00 -07:00
Haicheng Wu
8a3e4b8d02
Merge pull request #214 from Peter9606/separate-stream-error
...
Bugfix: memsetAsync uses wrong default stream
2021-03-24 12:09:01 -04:00
Peter Han
6a6b4028bd
Revert wrong fix of params.update in GemmUniversalBase
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-23 23:20:40 +08:00
Peter Han
92393b2676
Bugfix: memsetAsync uses wrong default stream
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-23 21:11:42 +08:00
Haicheng Wu
50bf00e5f2
Merge pull request #193 from Peter9606/public_shape_type_from_Mma_HFMA2
...
HFMA2 Convolutions for SM60 onwards
2021-03-05 21:38:59 -05:00
Manish Gupta
4cd004ead1
fix test name to optimized and instance large tile sizes to speed unit tests
2021-03-05 13:32:36 -08:00
Peter Han
6c4539e372
Make arch tag of test cases more precisely to SM60
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-05 10:53:26 +08:00
Peter Han
a3639ab1a0
Append fp16 test case to verify Mma_HFMA2
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-04 18:17:57 +08:00
Peter Han
169181f30f
Make Shape public from Mma_HFMA2.
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-04 11:05:16 +08:00
Haicheng Wu
0f1056390d
Create PUBLICATIONS.md ( #189 )
2021-03-03 11:17:40 -08:00
Haicheng Wu
34a42e5620
Update generator.py ( #192 )
2021-03-02 12:21:48 -08:00
Dustyn Blasig
8f09b82b12
Merge pull request #187 from NVIDIA/cutlass_2.5
...
CUTLASS 2.5.0
2021-02-26 23:56:04 -06:00
Andrew Kerr
200a5a5146
Enabled reduction unit tests.
2021-02-26 15:46:57 -05:00
Andrew Kerr
746b7b3247
Enabled tensor reduction kernels.
2021-02-26 15:32:19 -05:00
Andrew Kerr
abdf16a4d9
Updated release notes.
2021-02-26 13:55:04 -05:00
Andrew Kerr
0e13748649
CUTLASS 2.5
2021-02-26 09:58:26 -05:00
Manish Gupta
ccb697bac7
cutlass 2.4 documentation only update
2020-11-23 06:59:45 -06:00
Yang Wang
e6bcdc60cf
fix broken links ( #148 )
2020-11-19 21:46:54 -08:00
Manish Gupta
6615010cd0
CUTLASS 2.4 (Implicit GEMM convolution) ( #147 )
...
CUTLASS 2.4 (Implicit GEMM Convolution)
Co-authored-by: Manish Gupta <manigupta@nvidia.com>, Haicheng Wu <haichengw@nvidia.com>, Dustyn Blasig <dblasig@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
2020-11-19 21:25:25 -08:00
Dustyn Blasig
c2b80ad4e4
Merge pull request #135 from NVIDIA/cutlass_2.3_final
...
CUTLASS 2.3.0
2020-09-25 13:25:26 -05:00
akerr
37a8f9e598
CUTLASS 2.3.0 final.
2020-09-25 10:34:46 -07:00
Andrew Kerr
c53f3339bb
CUTLASS 2.3 initial commit ( #134 )
...
CUTLASS 2.3 adds GEMMs targeting Sparse Tensor Cores on the NVIDIA Ampere Architecture, fast SGEMM, and small matrix classes, bug fixes, and performance enhancements.
2020-09-23 14:00:58 -07:00
hwu36
4dac7490e6
Typoes ( #107 )
...
* Update splitk_gemm.cu
* Update gemm_bias_relu.cu
* Update mma_sm75.h
2020-07-13 14:25:52 -07:00
Andrew Kerr
fd7e058d0c
Added examples to enable the unity build ( #102 )
...
* Updated documentation of fused GEMM example and removed UNITY BUILD batch size. The default batch size when unity build is enabled tends to be favorable.
2020-06-17 07:09:18 -07:00
Andrew Kerr
1ab1027954
Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. ( #100 )
...
- Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>.
- Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out
- Added test_examples target to build and test all CUTLASS examples
- Minor edits to documentation to point to GTC 2020 webinar
2020-06-15 10:47:01 -07:00