Commit Graph

149 Commits

Author SHA1 Message Date
Manish Gupta
1ac4559d12
Cutlass 2.6 Update 1 (#301)
* cutlass 2.6 update

* remove debug prints
2021-07-27 17:58:30 -07:00
Manish Gupta
e5d51840e8
CUTLASS 2.6 (#298)
CUTLASS 2.6
2021-07-23 00:40:53 -04:00
Haicheng Wu
6c29fe20ba
Merge pull request #285 from tjingrant/patch-1
Typo Fixes
2021-07-05 22:51:19 -04:00
Tian Jin
e3c56b0d6b
Update predicated_tile_iterator.h 2021-07-05 12:11:53 -04:00
Tian Jin
4647c57243
Update predicated_tile_iterator.h 2021-07-05 12:06:41 -04:00
Haicheng Wu
856d4db3fb
Update basic_gemm.cu
fix the matrix malloc size
2021-06-15 09:08:36 -04:00
Haicheng Wu
6a1064093f
Merge pull request #274 from mani-ananth/master
Some pending Bug fixes
2021-06-02 13:17:39 -04:00
Manikandan Ananth
c5f1ef4dff update contributors 2021-06-02 10:11:42 -07:00
Manikandan Ananth
47ebfccbec bug fixes 2021-06-02 10:08:25 -07:00
Haicheng Wu
ad9486684f
Merge pull request #272 from BernardoCovas/master
Bug in reference conv3d
2021-05-28 17:18:27 -04:00
Bernardo Covas
1d8372a8e2 fix typo in reference conv3d 2021-05-28 21:06:59 +01:00
Haicheng Wu
9cb7d63424
Merge pull request #266 from mani-ananth/master
Fixes for public issue #265
2021-05-19 15:15:22 -04:00
Manikandan Ananth
da2f110906 Fixes for public issue #265 2021-05-19 10:16:52 -07:00
Haicheng Wu
b68113f5be
Merge pull request #264 from zheng95z/patch-3
Adds `NoBetaScaling` for `LinearCombination`
2021-05-17 10:03:30 -04:00
Zheng Zeng
a68d7cd6f1
Adds NoBetaScaling for LinearCombination 2021-05-12 22:23:55 +08:00
Haicheng Wu
38e8b29f56
Merge pull request #259 from hzfan/ignore_pr
Add gitignore
2021-05-10 20:07:53 -04:00
Haozheng Fan
ee7349c94f fix 2021-05-10 16:39:04 +08:00
Haozheng Fan
8cdd4293d4 add gitignore 2021-05-10 16:37:59 +08:00
Haicheng Wu
f58b843951
Merge pull request #239 from KeDengMS/kedeng/gelu
Fixes to Gelu for half and fusion
2021-05-08 12:51:42 -04:00
Haicheng Wu
5fc142296f
Merge pull request #237 from Peter9606/issue_236_typo
Typo fix issue#236
2021-05-08 07:51:19 -04:00
Haicheng Wu
233d69aa6d
Merge pull request #235 from Peter9606/issue_233_tranpose_update
tranpose.h update based on issue#233
2021-05-07 07:14:30 -04:00
Haicheng Wu
9840d25269
Merge pull request #256 from zheng95z/patch-2
Fixes some typos in utilities.md
2021-05-06 11:02:49 -04:00
Zheng Zeng
b878c96421
Fixes some typos in utilities.md 2021-05-06 22:37:37 +08:00
Haicheng Wu
8f8a80cad5
Merge pull request #251 from zheng95z/patch-1
add a missing 'device_memory::' before a function
2021-04-25 22:09:44 -04:00
Zheng Zeng
a8f6f8eb07
add a missing 'device_memory::' before a function 2021-04-25 20:05:39 +08:00
Haicheng Wu
7c783adf53
Merge pull request #247 from xue-fc/patch-1
fix a wrong description
2021-04-22 09:27:40 -04:00
xue-fc
4000df9567
fix a wrong description 2021-04-22 20:28:28 +08:00
KeDengMS
0b74c8f473 Address CR 2021-04-19 23:36:06 +00:00
KeDengMS
83036ed646 More clean up 2021-04-18 04:29:20 +00:00
KeDengMS
b7e43f5eb9 Clean up 2021-04-18 04:24:25 +00:00
KeDengMS
5c62d892fa Add test 2021-04-18 04:09:34 +00:00
KeDengMS
41a31b404b Fixes to Gelu for half and fusion 2021-04-17 22:10:19 +00:00
Peter Han
7320aee17d Typo fix issue#236
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-04-15 15:08:35 +08:00
Peter Han
2142a05d9d tranpose.h update based on issue#233
1. Add 'pragma once' preprocess directive
 2. Replace prmt PTX with __byte_perm intrinsic

Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-04-14 19:58:00 +08:00
Haicheng Wu
c77a524459
Merge pull request #230 from mani-ananth/master
Fix for issue #221
2021-04-09 14:45:55 -04:00
Manikandan Ananth
fac6680f31 Merge branch 'master' of github.com:NVIDIA/cutlass 2021-04-09 11:36:31 -07:00
Manikandan Ananth
08993707da fixing functional bug in fused epilogue 2021-04-09 11:36:03 -07:00
Haicheng Wu
c805593ebe
Merge pull request #228 from mani-ananth/master
Fix for issue#224 and issue#225
2021-04-08 10:08:13 -04:00
Manikandan Ananth
26556d7206 fix a broken sparse gemm example. found by the community. 2021-04-07 13:32:55 -07:00
Manikandan Ananth
4839b6cb61 add 2stage fprop 3d into default file 2021-04-07 13:29:32 -07:00
Haicheng Wu
d97214987a
Merge pull request #220 from Peter9606/wrong-stride-array-definition
Bugfix: typo, make reduction device cases passed
2021-04-02 08:43:52 -04:00
Haicheng Wu
b0bbc6d548
Merge pull request #219 from mani-ananth/master
Fix for issue #211
2021-04-02 08:42:09 -04:00
Peter Han
7074047a54 Bugfix: typo, make reduction device cases passed
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-04-02 09:35:23 +08:00
Manikandan Ananth
75a4737cfe Fix for public issue #211
- Add a slice-K tile size to the profiler
- fix num warps calculations in implicit gemm header
2021-04-01 14:42:00 -07:00
Haicheng Wu
8a3e4b8d02
Merge pull request #214 from Peter9606/separate-stream-error
Bugfix: memsetAsync uses wrong default stream
2021-03-24 12:09:01 -04:00
Peter Han
6a6b4028bd Revert wrong fix of params.update in GemmUniversalBase
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-23 23:20:40 +08:00
Peter Han
92393b2676 Bugfix: memsetAsync uses wrong default stream
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-23 21:11:42 +08:00
Haicheng Wu
50bf00e5f2
Merge pull request #193 from Peter9606/public_shape_type_from_Mma_HFMA2
HFMA2 Convolutions for SM60 onwards
2021-03-05 21:38:59 -05:00
Manish Gupta
4cd004ead1 fix test name to optimized and instance large tile sizes to speed unit tests 2021-03-05 13:32:36 -08:00
Peter Han
6c4539e372 Make arch tag of test cases more precisely to SM60
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-05 10:53:26 +08:00