Pradeep Ramani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8236f30675 
							
						 
					 
					
						
						
							
							CUTLASS 3.4.0 ( #1286 )  
						
						... 
						
						
						
						* CUTLASS 3.4.0
* Update CHANGELOG.md
---------
Co-authored-by: Pradeep Ramani <prramani@nvidia.com> 
						
					 
					
						2023-12-29 15:21:31 -05:00 
						 
				 
			
				
					
						
							
							
								Pradeep Ramani 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e9e30c2304 
							
						 
					 
					
						
						
							
							Updates and Bug fixes to CUTLASS 3.3 ( #1232 )  
						
						
						
					 
					
						2023-12-05 09:50:49 -05:00 
						 
				 
			
				
					
						
							
							
								wang-y-z 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							557be3ab0e 
							
						 
					 
					
						
						
							
							Fix several typos ( #1169 )  
						
						... 
						
						
						
						Co-authored-by: isaacw <isaacw@nvidia.com> 
						
					 
					
						2023-11-02 23:54:46 -04:00 
						 
				 
			
				
					
						
							
							
								ANIKET SHIVAM 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							90d3b0fb18 
							
						 
					 
					
						
						
							
							CUTLASS 3.2.1 ( #1113 )  
						
						... 
						
						
						
						* Updates for 3.2.1 release.
* Minor fix in gemm op profiler for raster order.
* Add scheduler mapping for raster order in the kernels. 
						
					 
					
						2023-09-26 17:24:26 -04:00 
						 
				 
			
				
					
						
							
							
								ANIKET SHIVAM 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4575443d44 
							
						 
					 
					
						
						
							
							CUTLASS 3.2 ( #1024 )  
						
						... 
						
						
						
						* CUTLASS 3.2 
						
					 
					
						2023-08-07 20:50:32 -04:00 
						 
				 
			
				
					
						
							
							
								ANIKET SHIVAM 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d572cc1aab 
							
						 
					 
					
						
						
							
							CUTLASS 3.1 ( #915 )  
						
						... 
						
						
						
						Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 
						
					 
					
						2023-04-14 23:19:34 -04:00 
						 
				 
			
				
					
						
							
							
								Alexander Pivovarov 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							7e370c9637 
							
						 
					 
					
						
						
							
							Fix typos 2 ( #842 )  
						
						... 
						
						
						
						Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com> 
						
					 
					
						2023-03-09 23:22:56 -05:00 
						 
				 
			
				
					
						
							
							
								Shuai Shao 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ce8597dc14 
							
						 
					 
					
						
						
							
							Fix type bug in conv2d/gemm with broadcast ( #796 )  
						
						... 
						
						
						
						add ElementVector
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com> 
						
					 
					
						2023-02-09 20:53:25 -05:00 
						 
				 
			
				
					
						
							
							
								Vijay Thakkar 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							277bd6e537 
							
						 
					 
					
						
						
							
							CUTLASS 3.0.0 ( #786 )  
						
						... 
						
						
						
						* CUTLASS 3.0.0 
						
					 
					
						2023-01-23 20:55:28 -05:00 
						 
				 
			
				
					
						
							
							
								ANIKET SHIVAM 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							66d9cddc83 
							
						 
					 
					
						
						
							
							New updates for 2.11 ( #775 )  
						
						... 
						
						
						
						* New updates.
* Minor profiler updates
Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 
						
					 
					
						2023-01-20 16:32:57 -05:00 
						 
				 
			
				
					
						
							
							
								Aditya Atluri 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c975e2ccbb 
							
						 
					 
					
						
						
							
							releaase 2.11 ( #703 )  
						
						
						
					 
					
						2022-11-19 09:02:15 -05:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fc9ebc645b 
							
						 
					 
					
						
						
							
							CUTLASS 2.10 bug fixes and minor updates. ( #626 )  
						
						
						
					 
					
						2022-09-15 16:20:33 -04:00 
						 
				 
			
				
					
						
							
							
								ANIKET SHIVAM 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b72cbf957d 
							
						 
					 
					
						
						
							
							CUTLASS 2.10 ( #615 )  
						
						... 
						
						
						
						Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 
						
					 
					
						2022-09-03 18:48:46 -04:00 
						 
				 
			
				
					
						
							
							
								Ivan Komarov 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0b8cacd6f1 
							
						 
					 
					
						
						
							
							Remove redundant <fstream> includes ( #563 )  
						
						... 
						
						
						
						* Remove redundant <fstream> includes
* Fix fstream in examples/
* Fix <fstream> in test/
* Use consistent order for <fstream> (always after <iostream>)
* Remove an unneeded include in a file where std::ofstream usage is commented out
Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru> 
						
					 
					
						2022-07-19 15:23:54 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6023038bae 
							
						 
					 
					
						
						
							
							add verification of the reduction tensor ( #489 )  
						
						... 
						
						
						
						Co-authored-by: Haicheng Wu <haichengw@nvidia.com> 
						
					 
					
						2022-05-06 10:24:51 -07:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							12f4108ac2 
							
						 
					 
					
						
						
							
							CUTLASS 2.9 ( #468 )  
						
						
						
					 
					
						2022-04-23 15:02:38 -04:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8a951b2940 
							
						 
					 
					
						
						
							
							Enable convolution with fused epilogue for Volta Tensor Cores ( #402 )  
						
						... 
						
						
						
						* Enabled convolution with epilogue fusion for Volta Tensor Cores.
* Compilation fixes
* Disabled testing Volta on Ampere architectures. 
						
					 
					
						2022-01-30 23:24:50 -05:00 
						 
				 
			
				
					
						
							
							
								masahi 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c2ee13a0fe 
							
						 
					 
					
						
						
							
							Add epilogue functor for residual block fusion ( #391 )  
						
						... 
						
						
						
						* Add epilogue functor for residual block fusion
* Do not run split-k tests when ActivationOp is not Identity
* explain TestSplitK param
* return early 
						
					 
					
						2021-12-29 22:53:40 -05:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ec4f7e5194 
							
						 
					 
					
						
						
							
							Updates to fused epilogue ( #383 )  
						
						... 
						
						
						
						* Enhancements and fixes to fused GEMM and Convolution epilogue.
* Need to explicitly list cudart as unit test library dependency. 
						
					 
					
						2021-12-17 16:04:43 -05:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2e07c4cc2f 
							
						 
					 
					
						
						
							
							CUTLASS 2.7 ( #318 )  
						
						... 
						
						
						
						CUTLASS 2.7
Mainloop fusion for GEMM: summation over A or B
Strided DGRAD (optimized iterators)
Half-precision GELU_taylor activation functions
Use these when accumulation and epilogue compute types are all cutlass::half_t
Tuning and bug fixes to fused GEMM + GEMM example
Support for smaller than 128b aligned Convolutions: see examples
Caching of results to accelerate Convolution unit tests
Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
Corrections and bug fixes reported by the CUTLASS community
Thank you for filing these issues!
authored-by: Haicheng Wu haichengw@nvidia.com , Manish Gupta manigupta@nvidia.com , Dustyn Blasig dblasig@nvidia.com , Andrew Kerr akerr@nvidia.com  
						
					 
					
						2021-09-20 11:02:22 -07:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
						
						
							
						
						
							59e2aa505a 
							
						 
					 
					
						
						
							
							refine the implementation  
						
						
						
					 
					
						2021-09-08 13:14:08 +00:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
						
						
							
						
						
							4e8af93da1 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'origin/master' into small_alignment  
						
						
						
					 
					
						2021-09-07 20:39:38 +00:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6c2f8f2fb8 
							
						 
					 
					
						
						
							
							CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning  
						
						... 
						
						
						
						* cutlass 2.6 update
* remove debug prints
* cutlass 2.6.1 (minor update)
* Updated CHANGELOG.
* Minor edit to readme to indicate patch version.
* Minor edit to readme.
Co-authored-by:  Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com> 
						
					 
					
						2021-09-03 10:26:15 -07:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
						
						
							
						
						
							598e35401c 
							
						 
					 
					
						
						
							
							Merge remote-tracking branch 'origin/master' into small_alignment  
						
						
						
					 
					
						2021-08-16 07:49:08 -07:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1ac4559d12 
							
						 
					 
					
						
						
							
							Cutlass 2.6 Update 1 ( #301 )  
						
						... 
						
						
						
						* cutlass 2.6 update
* remove debug prints 
						
					 
					
						2021-07-27 17:58:30 -07:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e5d51840e8 
							
						 
					 
					
						
						
							
							CUTLASS 2.6 ( #298 )  
						
						... 
						
						
						
						CUTLASS 2.6 
						
					 
					
						2021-07-23 00:40:53 -04:00 
						 
				 
			
				
					
						
							
							
								mengchi.hmc 
							
						 
					 
					
						
						
						
						
							
						
						
							f4b0a33633 
							
						 
					 
					
						
						
							
							add unit test for non int4 load  
						
						
						
					 
					
						2021-04-23 14:33:46 +08:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
						
						
							
						
						
							4cd004ead1 
							
						 
					 
					
						
						
							
							fix test name to optimized and instance large tile sizes to speed unit tests  
						
						
						
					 
					
						2021-03-05 13:32:36 -08:00 
						 
				 
			
				
					
						
							
							
								Peter Han 
							
						 
					 
					
						
						
						
						
							
						
						
							6c4539e372 
							
						 
					 
					
						
						
							
							Make arch tag of test cases more precisely to SM60  
						
						... 
						
						
						
						Signed-off-by: Peter Han <fujun.han@iluvatar.ai> 
						
					 
					
						2021-03-05 10:53:26 +08:00 
						 
				 
			
				
					
						
							
							
								Peter Han 
							
						 
					 
					
						
						
						
						
							
						
						
							a3639ab1a0 
							
						 
					 
					
						
						
							
							Append fp16 test case to verify Mma_HFMA2  
						
						... 
						
						
						
						Signed-off-by: Peter Han <fujun.han@iluvatar.ai> 
						
					 
					
						2021-03-04 18:17:57 +08:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
						
						
							
						
						
							0e13748649 
							
						 
					 
					
						
						
							
							CUTLASS 2.5  
						
						
						
					 
					
						2021-02-26 09:58:26 -05:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6615010cd0 
							
						 
					 
					
						
						
							
							CUTLASS 2.4 (Implicit GEMM convolution) ( #147 )  
						
						... 
						
						
						
						CUTLASS 2.4 (Implicit GEMM Convolution)
Co-authored-by: Manish Gupta <manigupta@nvidia.com>, Haicheng Wu <haichengw@nvidia.com>, Dustyn Blasig <dblasig@nvidia.com>, Andrew Kerr <akerr@nvidia.com> 
						
					 
					
						2020-11-19 21:25:25 -08:00