Vijay Thakkar 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							629f4653c3 
							
						 
					 
					
						
						
							
							CUTLASS 3.5.0 ( #1411 )  
						
						
						
					 
					
						2024-03-19 17:51:04 -04:00 
						 
				 
			
				
					
						
							
							
								ANIKET SHIVAM 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							751eb9a885 
							
						 
					 
					
						
						
							
							Update license year ( #1306 )  
						
						
						
					 
					
						2024-01-16 14:37:22 -05:00 
						 
				 
			
				
					
						
							
							
								Vadim Markovtsev 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8783c41851 
							
						 
					 
					
						
						
							
							Replace 0x1f with 0xffffffff in __shfl_sync ( #1097 )  
						
						... 
						
						
						
						This fixes compatibility with H100 and resolves  #1094  
						
					 
					
						2023-09-18 19:58:19 -04:00 
						 
				 
			
				
					
						
							
							
								ANIKET SHIVAM 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4575443d44 
							
						 
					 
					
						
						
							
							CUTLASS 3.2 ( #1024 )  
						
						... 
						
						
						
						* CUTLASS 3.2 
						
					 
					
						2023-08-07 20:50:32 -04:00 
						 
				 
			
				
					
						
							
							
								Jack Kosaian 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							87349d3496 
							
						 
					 
					
						
						
							
							Add grouped b2b GEMM ( #970 )  
						
						
						
					 
					
						2023-06-05 17:16:57 -04:00 
						 
				 
			
				
					
						
							
							
								Aleksandr Pivovar 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							4a68cf748e 
							
						 
					 
					
						
						
							
							added support of b2b bmm ( #849 )  
						
						... 
						
						
						
						* added support of b2b bmm
* fixed arguments and params structures
* added batch_count argument
* removed SplitKSerial and added new test case with b2b bmm
* fixed support of Kbatched and added new test case with batch stride
* added batch support for bias and scale
* make test
* small changes
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com> 
						
					 
					
						2023-04-14 23:20:02 -04:00 
						 
				 
			
				
					
						
							
							
								ANIKET SHIVAM 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							66d9cddc83 
							
						 
					 
					
						
						
							
							New updates for 2.11 ( #775 )  
						
						... 
						
						
						
						* New updates.
* Minor profiler updates
Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 
						
					 
					
						2023-01-20 16:32:57 -05:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							497b499d9d 
							
						 
					 
					
						
						
							
							Add residual support for shmem staging iterator used in back-to-back GEMM fusion. This allows support of problem_size_0_n that is not multiple of 32. ( #590 )  
						
						... 
						
						
						
						Co-authored-by: Haicheng Wu <haichengw@nvidia.com> 
						
					 
					
						2022-08-15 11:19:24 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ec2b4fd85d 
							
						 
					 
					
						
						
							
							b2b bias vector support ( #482 )  
						
						... 
						
						
						
						* b2b bias vector support
* add files
Co-authored-by: Haicheng Wu <haichengw@nvidia.com> 
						
					 
					
						2022-04-30 04:16:15 -07:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							12f4108ac2 
							
						 
					 
					
						
						
							
							CUTLASS 2.9 ( #468 )  
						
						
						
					 
					
						2022-04-23 15:02:38 -04:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							808c25337a 
							
						 
					 
					
						
						
							
							CUTLASS 2.8 ( #363 )  
						
						... 
						
						
						
						CUTLASS 2.8 
						
					 
					
						2021-11-19 13:26:35 -08:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1ac4559d12 
							
						 
					 
					
						
						
							
							Cutlass 2.6 Update 1 ( #301 )  
						
						... 
						
						
						
						* cutlass 2.6 update
* remove debug prints 
						
					 
					
						2021-07-27 17:58:30 -07:00 
						 
				 
			
				
					
						
							
							
								Manish Gupta 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e5d51840e8 
							
						 
					 
					
						
						
							
							CUTLASS 2.6 ( #298 )  
						
						... 
						
						
						
						CUTLASS 2.6 
						
					 
					
						2021-07-23 00:40:53 -04:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
						
						
							
						
						
							0e13748649 
							
						 
					 
					
						
						
							
							CUTLASS 2.5  
						
						
						
					 
					
						2021-02-26 09:58:26 -05:00