Haicheng Wu
a3bcc6981d
Merge pull request #331 from reed-lau/feature/fix-wmma-shape-typo
...
fix wmma shape typo
2021-09-28 10:20:29 -04:00
reed-lau
3b28642801
fix wmma shape typo
2021-09-28 19:04:09 +08:00
Manish Gupta
538592dea4
example 23 gemm operand reduction fusion ( #325 )
2021-09-20 13:34:47 -07:00
Manish Gupta
2e07c4cc2f
CUTLASS 2.7 ( #318 )
...
CUTLASS 2.7
Mainloop fusion for GEMM: summation over A or B
Strided DGRAD (optimized iterators)
Half-precision GELU_taylor activation functions
Use these when accumulation and epilogue compute types are all cutlass::half_t
Tuning and bug fixes to fused GEMM + GEMM example
Support for smaller than 128b aligned Convolutions: see examples
Caching of results to accelerate Convolution unit tests
Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
Corrections and bug fixes reported by the CUTLASS community
Thank you for filing these issues!
authored-by: Haicheng Wu haichengw@nvidia.com , Manish Gupta manigupta@nvidia.com , Dustyn Blasig dblasig@nvidia.com , Andrew Kerr akerr@nvidia.com
2021-09-20 11:02:22 -07:00
Haicheng Wu
9ac255863f
Merge pull request #246 from mengchihe/master
...
support unalignment input for conv2d fprop stage=2 Fix for issue #242
2021-09-08 11:40:53 -04:00
Haicheng Wu
59e2aa505a
refine the implementation
2021-09-08 13:14:08 +00:00
Haicheng Wu
4e8af93da1
Merge remote-tracking branch 'origin/master' into small_alignment
2021-09-07 20:39:38 +00:00
Manish Gupta
6c2f8f2fb8
CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning
...
* cutlass 2.6 update
* remove debug prints
* cutlass 2.6.1 (minor update)
* Updated CHANGELOG.
* Minor edit to readme to indicate patch version.
* Minor edit to readme.
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
2021-09-03 10:26:15 -07:00
Haicheng Wu
598e35401c
Merge remote-tracking branch 'origin/master' into small_alignment
2021-08-16 07:49:08 -07:00
Manish Gupta
a01feb93d9
Merge pull request #308 from dongxiao92/patch-1
...
fix typo in doc
2021-08-08 11:54:42 -07:00
dongxiao
d36f331b44
fix typo in doc
...
fix typo
2021-08-08 16:44:22 +08:00
Haicheng Wu
69abafb85a
Merge pull request #306 from NVIDIA/fix-profiler-cmd-doc
...
Fix profiler cmd doc
2021-07-30 14:36:54 -04:00
Haicheng Wu
68a078fbbf
cleanup
2021-07-30 11:27:21 -07:00
Haicheng Wu
10709dbb64
clean profiler cmd and doc
2021-07-30 11:02:17 -07:00
Manish Gupta
1227351079
Merge pull request #305 from NVIDIA/fix_epilogue_spill
...
fix epilogue register spill
2021-07-29 14:30:11 -07:00
Haicheng Wu
a77c658439
fix epilogue register spill
2021-07-29 14:25:48 -07:00
Haicheng Wu
4516b833ce
Merge pull request #303 from Peter9606/doc_typo
...
Doc typo
2021-07-28 20:49:06 -04:00
Peter Han
64dd1e1915
Doc typo
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-07-29 08:45:59 +08:00
Manish Gupta
1ac4559d12
Cutlass 2.6 Update 1 ( #301 )
...
* cutlass 2.6 update
* remove debug prints
2021-07-27 17:58:30 -07:00
Manish Gupta
e5d51840e8
CUTLASS 2.6 ( #298 )
...
CUTLASS 2.6
2021-07-23 00:40:53 -04:00
Haicheng Wu
6c29fe20ba
Merge pull request #285 from tjingrant/patch-1
...
Typo Fixes
2021-07-05 22:51:19 -04:00
Tian Jin
e3c56b0d6b
Update predicated_tile_iterator.h
2021-07-05 12:11:53 -04:00
Tian Jin
4647c57243
Update predicated_tile_iterator.h
2021-07-05 12:06:41 -04:00
Haicheng Wu
856d4db3fb
Update basic_gemm.cu
...
fix the matrix malloc size
2021-06-15 09:08:36 -04:00
Haicheng Wu
6a1064093f
Merge pull request #274 from mani-ananth/master
...
Some pending Bug fixes
2021-06-02 13:17:39 -04:00
Manikandan Ananth
c5f1ef4dff
update contributors
2021-06-02 10:11:42 -07:00
Manikandan Ananth
47ebfccbec
bug fixes
2021-06-02 10:08:25 -07:00
Haicheng Wu
ad9486684f
Merge pull request #272 from BernardoCovas/master
...
Bug in reference conv3d
2021-05-28 17:18:27 -04:00
Bernardo Covas
1d8372a8e2
fix typo in reference conv3d
2021-05-28 21:06:59 +01:00
Haicheng Wu
9cb7d63424
Merge pull request #266 from mani-ananth/master
...
Fixes for public issue #265
2021-05-19 15:15:22 -04:00
Manikandan Ananth
da2f110906
Fixes for public issue #265
2021-05-19 10:16:52 -07:00
Haicheng Wu
b68113f5be
Merge pull request #264 from zheng95z/patch-3
...
Adds `NoBetaScaling` for `LinearCombination`
2021-05-17 10:03:30 -04:00
Zheng Zeng
a68d7cd6f1
Adds NoBetaScaling
for LinearCombination
2021-05-12 22:23:55 +08:00
Haicheng Wu
38e8b29f56
Merge pull request #259 from hzfan/ignore_pr
...
Add gitignore
2021-05-10 20:07:53 -04:00
Haozheng Fan
ee7349c94f
fix
2021-05-10 16:39:04 +08:00
Haozheng Fan
8cdd4293d4
add gitignore
2021-05-10 16:37:59 +08:00
Haicheng Wu
f58b843951
Merge pull request #239 from KeDengMS/kedeng/gelu
...
Fixes to Gelu for half and fusion
2021-05-08 12:51:42 -04:00
Haicheng Wu
5fc142296f
Merge pull request #237 from Peter9606/issue_236_typo
...
Typo fix issue#236
2021-05-08 07:51:19 -04:00
Haicheng Wu
233d69aa6d
Merge pull request #235 from Peter9606/issue_233_tranpose_update
...
tranpose.h update based on issue#233
2021-05-07 07:14:30 -04:00
Haicheng Wu
9840d25269
Merge pull request #256 from zheng95z/patch-2
...
Fixes some typos in utilities.md
2021-05-06 11:02:49 -04:00
Zheng Zeng
b878c96421
Fixes some typos in utilities.md
2021-05-06 22:37:37 +08:00
Haicheng Wu
8f8a80cad5
Merge pull request #251 from zheng95z/patch-1
...
add a missing 'device_memory::' before a function
2021-04-25 22:09:44 -04:00
Zheng Zeng
a8f6f8eb07
add a missing 'device_memory::' before a function
2021-04-25 20:05:39 +08:00
mengchi.hmc
f4b0a33633
add unit test for non int4 load
2021-04-23 14:33:46 +08:00
Haicheng Wu
7c783adf53
Merge pull request #247 from xue-fc/patch-1
...
fix a wrong description
2021-04-22 09:27:40 -04:00
xue-fc
4000df9567
fix a wrong description
2021-04-22 20:28:28 +08:00
mengchi.hmc
bb35a3ba6f
support setting load granularity for conv2d fprop
2021-04-22 15:20:57 +08:00
mengchi.hmc
7ec3a87f22
support unalignment input for conv2d fprop stage=2 Fix for issue #242
2021-04-21 14:40:05 +08:00
KeDengMS
0b74c8f473
Address CR
2021-04-19 23:36:06 +00:00
KeDengMS
83036ed646
More clean up
2021-04-18 04:29:20 +00:00