Andrew Kerr 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ec4f7e5194 
							
						 
					 
					
						
						
							
							Updates to fused epilogue ( #383 )  
						
						... 
						
						
						
						* Enhancements and fixes to fused GEMM and Convolution epilogue.
* Need to explicitly list cudart as unit test library dependency. 
						
					 
					
						2021-12-17 16:04:43 -05:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4e666e1dfd 
							
						 
					 
					
						
						
							
							Updated README and added issue templates. ( #382 )  
						
						
						
					 
					
						2021-12-17 09:26:20 -05:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							3799e12f25 
							
						 
					 
					
						
						
							
							Merge pull request  #381  from Peter9606/update-makefile-version  
						
						... 
						
						
						
						Update project version to 2.8.0 in CMakeLists.txt 
						
					 
					
						2021-12-16 21:54:57 -05:00 
						 
				 
			
				
					
						
							
							
								Peter Han 
							
						 
					 
					
						
						
						
						
							
						
						
							fc3bc85db8 
							
						 
					 
					
						
						
							
							Update project version to 2.8.0 in CMakeLists.txt  
						
						... 
						
						
						
						Signed-off-by: Peter Han <fujun.han@iluvatar.ai> 
						
					 
					
						2021-12-17 02:23:31 +00:00 
						 
				 
			
				
					
						
							
							
								Matthew Nicely 
							
						 
					 
					
						
						
						
						
							
						
						
							49c0a58d50 
							
						 
					 
					
						
						
							
							Set theme jekyll-theme-minimal  
						
						
						
					 
					
						2021-12-15 14:51:24 -05:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5fe09c2d67 
							
						 
					 
					
						
						
							
							Updated GEMM performance plot with CUTLASS 2.8 compiled with CUDA 11.5 Toolkit ( #375 )  
						
						... 
						
						
						
						Updated GEMM performance plot with CUTLASS 2.8 compiled using CUDA 11.5 Toolkit.
GPUs under test:
    NVIDIA A100
    NVIDIA A2
    NVIDIA TitanV
    NVIDIA GeForce 2080 Ti 
						
					 
					
						2021-12-06 14:21:33 -05:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6b69c79ac3 
							
						 
					 
					
						
						
							
							Fixed contributor formatting. ( #365 )  
						
						
						
					 
					
						2021-11-22 11:30:53 -08:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							62e438f450 
							
						 
					 
					
						
						
							
							Listed Matthew Nicely as the CUTLASS product manager.. ( #364 )  
						
						
						
					 
					
						2021-11-19 17:51:21 -08:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							808c25337a 
							
						 
					 
					
						
						
							
							CUTLASS 2.8 ( #363 )  
						
						... 
						
						
						
						CUTLASS 2.8 
						
					 
					
						2021-11-19 13:26:35 -08:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6fc5008803 
							
						 
					 
					
						
						
							
							Update quickstart.md  
						
						... 
						
						
						
						fix a broken link 
						
					 
					
						2021-11-11 09:53:46 -05:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a3bcc6981d 
							
						 
					 
					
						
						
							
							Merge pull request  #331  from reed-lau/feature/fix-wmma-shape-typo  
						
						... 
						
						
						
						fix wmma shape typo 
						
					 
					
						2021-09-28 10:20:29 -04:00 
						 
				 
			
				
					
						
							
							
								reed-lau 
							
						 
					 
					
						
						
						
						
							
						
						
							3b28642801 
							
						 
					 
					
						
						
							
							fix wmma shape typo  
						
						
						
					 
					
						2021-09-28 19:04:09 +08:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							538592dea4 
							
						 
					 
					
						
						
							
							example 23 gemm operand reduction fusion ( #325 )  
						
						
						
					 
					
						2021-09-20 13:34:47 -07:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2e07c4cc2f 
							
						 
					 
					
						
						
							
							CUTLASS 2.7 ( #318 )  
						
						... 
						
						
						
						CUTLASS 2.7
Mainloop fusion for GEMM: summation over A or B
Strided DGRAD (optimized iterators)
Half-precision GELU_taylor activation functions
Use these when accumulation and epilogue compute types are all cutlass::half_t
Tuning and bug fixes to fused GEMM + GEMM example
Support for smaller than 128b aligned Convolutions: see examples
Caching of results to accelerate Convolution unit tests
Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
Corrections and bug fixes reported by the CUTLASS community
Thank you for filing these issues!
authored-by: Haicheng Wu haichengw@nvidia.com , Manish Gupta manigupta@nvidia.com , Dustyn Blasig dblasig@nvidia.com , Andrew Kerr akerr@nvidia.com  
						
					 
					
						2021-09-20 11:02:22 -07:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9ac255863f 
							
						 
					 
					
						
						
							
							Merge pull request  #246  from mengchihe/master  
						
						... 
						
						
						
						support unalignment input for conv2d fprop stage=2 Fix for issue #242  
						
					 
					
						2021-09-08 11:40:53 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
						
						
							
						
						
							59e2aa505a 
							
						 
					 
					
						
						
							
							refine the implementation  
						
						
						
					 
					
						2021-09-08 13:14:08 +00:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
						
						
							
						
						
							4e8af93da1 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'origin/master' into small_alignment  
						
						
						
					 
					
						2021-09-07 20:39:38 +00:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6c2f8f2fb8 
							
						 
					 
					
						
						
							
							CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning  
						
						... 
						
						
						
						* cutlass 2.6 update
* remove debug prints
* cutlass 2.6.1 (minor update)
* Updated CHANGELOG.
* Minor edit to readme to indicate patch version.
* Minor edit to readme.
Co-authored-by:  Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com> 
						
					 
					
						2021-09-03 10:26:15 -07:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
						
						
							
						
						
							598e35401c 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'origin/master' into small_alignment  
						
						
						
					 
					
						2021-08-16 07:49:08 -07:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a01feb93d9 
							
						 
					 
					
						
						
							
							Merge pull request  #308  from dongxiao92/patch-1  
						
						... 
						
						
						
						fix typo in doc 
						
					 
					
						2021-08-08 11:54:42 -07:00 
						 
				 
			
				
					
						
							
							
								dongxiao 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d36f331b44 
							
						 
					 
					
						
						
							
							fix typo in doc  
						
						... 
						
						
						
						fix typo 
						
					 
					
						2021-08-08 16:44:22 +08:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							69abafb85a 
							
						 
					 
					
						
						
							
							Merge pull request  #306  from NVIDIA/fix-profiler-cmd-doc  
						
						... 
						
						
						
						Fix profiler cmd doc 
						
					 
					
						2021-07-30 14:36:54 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
						
						
							
						
						
							68a078fbbf 
							
						 
					 
					
						
						
							
							cleanup  
						
						
						
					 
					
						2021-07-30 11:27:21 -07:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
						
						
							
						
						
							10709dbb64 
							
						 
					 
					
						
						
							
							clean profiler cmd and doc  
						
						
						
					 
					
						2021-07-30 11:02:17 -07:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1227351079 
							
						 
					 
					
						
						
							
							Merge pull request  #305  from NVIDIA/fix_epilogue_spill  
						
						... 
						
						
						
						fix epilogue register spill 
						
					 
					
						2021-07-29 14:30:11 -07:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
						
						
							
						
						
							a77c658439 
							
						 
					 
					
						
						
							
							fix epilogue register spill  
						
						
						
					 
					
						2021-07-29 14:25:48 -07:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4516b833ce 
							
						 
					 
					
						
						
							
							Merge pull request  #303  from Peter9606/doc_typo  
						
						... 
						
						
						
						Doc typo 
						
					 
					
						2021-07-28 20:49:06 -04:00 
						 
				 
			
				
					
						
							
							
								Peter Han 
							
						 
					 
					
						
						
						
						
							
						
						
							64dd1e1915 
							
						 
					 
					
						
						
							
							Doc typo  
						
						... 
						
						
						
						Signed-off-by: Peter Han <fujun.han@iluvatar.ai> 
						
					 
					
						2021-07-29 08:45:59 +08:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1ac4559d12 
							
						 
					 
					
						
						
							
							Cutlass 2.6 Update 1 ( #301 )  
						
						... 
						
						
						
						* cutlass 2.6 update
* remove debug prints 
						
					 
					
						2021-07-27 17:58:30 -07:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e5d51840e8 
							
						 
					 
					
						
						
							
							CUTLASS 2.6 ( #298 )  
						
						... 
						
						
						
						CUTLASS 2.6 
						
					 
					
						2021-07-23 00:40:53 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6c29fe20ba 
							
						 
					 
					
						
						
							
							Merge pull request  #285  from tjingrant/patch-1  
						
						... 
						
						
						
						Typo Fixes 
						
					 
					
						2021-07-05 22:51:19 -04:00 
						 
				 
			
				
					
						
							
							
								Tian Jin 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e3c56b0d6b 
							
						 
					 
					
						
						
							
							Update predicated_tile_iterator.h  
						
						
						
					 
					
						2021-07-05 12:11:53 -04:00 
						 
				 
			
				
					
						
							
							
								Tian Jin 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4647c57243 
							
						 
					 
					
						
						
							
							Update predicated_tile_iterator.h  
						
						
						
					 
					
						2021-07-05 12:06:41 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							856d4db3fb 
							
						 
					 
					
						
						
							
							Update basic_gemm.cu  
						
						... 
						
						
						
						fix the matrix malloc size 
						
					 
					
						2021-06-15 09:08:36 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6a1064093f 
							
						 
					 
					
						
						
							
							Merge pull request  #274  from mani-ananth/master  
						
						... 
						
						
						
						Some pending Bug fixes 
						
					 
					
						2021-06-02 13:17:39 -04:00 
						 
				 
			
				
					
						
							
							
								Manikandan Ananth 
							
						 
					 
					
						
						
						
						
							
						
						
							c5f1ef4dff 
							
						 
					 
					
						
						
							
							update contributors  
						
						
						
					 
					
						2021-06-02 10:11:42 -07:00 
						 
				 
			
				
					
						
							
							
								Manikandan Ananth 
							
						 
					 
					
						
						
						
						
							
						
						
							47ebfccbec 
							
						 
					 
					
						
						
							
							bug fixes  
						
						
						
					 
					
						2021-06-02 10:08:25 -07:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ad9486684f 
							
						 
					 
					
						
						
							
							Merge pull request  #272  from BernardoCovas/master  
						
						... 
						
						
						
						Bug in reference conv3d 
						
					 
					
						2021-05-28 17:18:27 -04:00 
						 
				 
			
				
					
						
							
							
								Bernardo Covas 
							
						 
					 
					
						
						
						
						
							
						
						
							1d8372a8e2 
							
						 
					 
					
						
						
							
							fix typo in reference conv3d  
						
						
						
					 
					
						2021-05-28 21:06:59 +01:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9cb7d63424 
							
						 
					 
					
						
						
							
							Merge pull request  #266  from mani-ananth/master  
						
						... 
						
						
						
						Fixes for public issue #265  
						
					 
					
						2021-05-19 15:15:22 -04:00 
						 
				 
			
				
					
						
							
							
								Manikandan Ananth 
							
						 
					 
					
						
						
						
						
							
						
						
							da2f110906 
							
						 
					 
					
						
						
							
							Fixes for public issue  #265  
						
						
						
					 
					
						2021-05-19 10:16:52 -07:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b68113f5be 
							
						 
					 
					
						
						
							
							Merge pull request  #264  from zheng95z/patch-3  
						
						... 
						
						
						
						Adds `NoBetaScaling` for `LinearCombination` 
						
					 
					
						2021-05-17 10:03:30 -04:00 
						 
				 
			
				
					
						
							
							
								Zheng Zeng 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a68d7cd6f1 
							
						 
					 
					
						
						
							
							Adds NoBetaScaling for LinearCombination  
						
						
						
					 
					
						2021-05-12 22:23:55 +08:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							38e8b29f56 
							
						 
					 
					
						
						
							
							Merge pull request  #259  from hzfan/ignore_pr  
						
						... 
						
						
						
						Add gitignore 
						
					 
					
						2021-05-10 20:07:53 -04:00 
						 
				 
			
				
					
						
							
							
								Haozheng Fan 
							
						 
					 
					
						
						
						
						
							
						
						
							ee7349c94f 
							
						 
					 
					
						
						
							
							fix  
						
						
						
					 
					
						2021-05-10 16:39:04 +08:00 
						 
				 
			
				
					
						
							
							
								Haozheng Fan 
							
						 
					 
					
						
						
						
						
							
						
						
							8cdd4293d4 
							
						 
					 
					
						
						
							
							add gitignore  
						
						
						
					 
					
						2021-05-10 16:37:59 +08:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f58b843951 
							
						 
					 
					
						
						
							
							Merge pull request  #239  from KeDengMS/kedeng/gelu  
						
						... 
						
						
						
						Fixes to Gelu for half and fusion 
						
					 
					
						2021-05-08 12:51:42 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5fc142296f 
							
						 
					 
					
						
						
							
							Merge pull request  #237  from Peter9606/issue_236_typo  
						
						... 
						
						
						
						Typo fix issue#236 
						
					 
					
						2021-05-08 07:51:19 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							233d69aa6d 
							
						 
					 
					
						
						
							
							Merge pull request  #235  from Peter9606/issue_233_tranpose_update  
						
						... 
						
						
						
						tranpose.h update based on issue#233 
						
					 
					
						2021-05-07 07:14:30 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9840d25269 
							
						 
					 
					
						
						
							
							Merge pull request  #256  from zheng95z/patch-2  
						
						... 
						
						
						
						Fixes some typos in utilities.md 
						
					 
					
						2021-05-06 11:02:49 -04:00