Tianqi Zhang (张天启) 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9f2e3faa69 
							
						 
					 
					
						
						
							
							fix call of GELU_Taylor in LinearCombinationGeneric ( #634 )  
						
						
						
					 
					
						2022-09-20 21:00:55 -04:00 
						 
				 
			
				
					
						
							
							
								Ying Zhang 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a821280dc7 
							
						 
					 
					
						
						
							
							Gemm broadcast  ( #632 )  
						
						... 
						
						
						
						* gemm_universal_with_broadcast, +2 sources.
* Revert "gemm_universal_with_broadcast, +2 sources."
This reverts commit fb063251f2144a091f12c9abfce7e1713f2d1c9e.
* gemm_universal_with_broadcast separated version.
* Update copyright banner.
* update banner 
						
					 
					
						2022-09-20 10:37:12 -04:00 
						 
				 
			
				
					
						
							
							
								Wenzhuo Liu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f73374a1eb 
							
						 
					 
					
						
						
							
							fix:comment typo in example 23 ( #633 )  
						
						
						
					 
					
						2022-09-19 09:54:14 -04:00 
						 
				 
			
				
					
						
							
							
								Yujia Zhai 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							faab7536fc 
							
						 
					 
					
						
						
							
							add comment ( #628 )  
						
						
						
					 
					
						2022-09-17 21:40:30 -04:00 
						 
				 
			
				
					
						
							
							
								Andrew Kerr 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fc9ebc645b 
							
						 
					 
					
						
						
							
							CUTLASS 2.10 bug fixes and minor updates. ( #626 )  
						
						
						
					 
					
						2022-09-15 16:20:33 -04:00 
						 
				 
			
				
					
						
							
							
								alexfreudenberg 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							2cc2c7ba1f 
							
						 
					 
					
						
						
							
							Add set_k_partition function ( #624 )  
						
						... 
						
						
						
						A member function set_k_partition is required for the instatiation of cutlass::gemm::kernel::Gemm, even though SplitKSerial is false 
						
					 
					
						2022-09-13 22:34:20 -04:00 
						 
				 
			
				
					
						
							
							
								ANIKET SHIVAM 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							50ceed7154 
							
						 
					 
					
						
						
							
							Minor README fix ( #623 )  
						
						... 
						
						
						
						* minor fix
* Minor fix 
						
					 
					
						2022-09-12 22:40:25 -04:00 
						 
				 
			
				
					
						
							
							
								ANIKET SHIVAM 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e773429f7e 
							
						 
					 
					
						
						
							
							CUTLASS 2.10 updates ( #622 )  
						
						... 
						
						
						
						Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 
						
					 
					
						2022-09-12 21:26:30 -04:00 
						 
				 
			
				
					
						
							
							
								Yujia Zhai 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							beae168f90 
							
						 
					 
					
						
						
							
							fix broken link ( #620 )  
						
						... 
						
						
						
						Co-authored-by: yuzhai <yuzhai@nvidia.com> 
						
					 
					
						2022-09-06 16:32:44 -04:00 
						 
				 
			
				
					
						
							
							
								Jack Kosaian 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f29d8f7ca9 
							
						 
					 
					
						
						
							
							Include vector in base_grouped.h ( #618 )  
						
						
						
					 
					
						2022-09-06 13:21:23 -04:00 
						 
				 
			
				
					
						
							
							
								Yujia Zhai 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b1d3f9b2fd 
							
						 
					 
					
						
						
							
							upstream internal updates ( #616 )  
						
						... 
						
						
						
						Co-authored-by: yuzhai <yuzhai@nvidia.com> 
						
					 
					
						2022-09-04 23:05:09 -04:00 
						 
				 
			
				
					
						
							
							
								ANIKET SHIVAM 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							b72cbf957d 
							
						 
					 
					
						
						
							
							CUTLASS 2.10 ( #615 )  
						
						... 
						
						
						
						Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 
						
					 
					
						2022-09-03 18:48:46 -04:00 
						 
				 
			
				
					
						
							
							
								Cliff Burdick 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ca23ff7924 
							
						 
					 
					
						
						
							
							Fixed typo in class name ( #608 )  
						
						
						
					 
					
						2022-08-29 20:51:52 -04:00 
						 
				 
			
				
					
						
							
							
								Cliff Burdick 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1c3d400b14 
							
						 
					 
					
						
						
							
							Added value_type trait to complex to make it an easier drop-in replacement for std::complex. ( #607 )  
						
						
						
					 
					
						2022-08-28 01:12:40 -04:00 
						 
				 
			
				
					
						
							
							
								Cliff Burdick 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							abafbf2afd 
							
						 
					 
					
						
						
							
							Missing comma in trmm header ( #604 )  
						
						
						
					 
					
						2022-08-25 16:07:33 -04:00 
						 
				 
			
				
					
						
							
							
								Cliff Burdick 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							536b20763e 
							
						 
					 
					
						
						
							
							Fixed typo in profiler README ( #603 )  
						
						
						
					 
					
						2022-08-24 21:55:13 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							497b499d9d 
							
						 
					 
					
						
						
							
							Add residual support for shmem staging iterator used in back-to-back GEMM fusion. This allows support of problem_size_0_n that is not multiple of 32. ( #590 )  
						
						... 
						
						
						
						Co-authored-by: Haicheng Wu <haichengw@nvidia.com> 
						
					 
					
						2022-08-15 11:19:24 -04:00 
						 
				 
			
				
					
						
							
							
								Jack Kosaian 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e66bfcb1f8 
							
						 
					 
					
						
						
							
							Fix for  #596  (typo in example 03) ( #597 )  
						
						... 
						
						
						
						* [examples] Fix typos in SYRK and TRMM examples
* Fix typo in example 03 
						
					 
					
						2022-08-09 09:58:36 -04:00 
						 
				 
			
				
					
						
							
							
								Michaël Benesty 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1617685a77 
							
						 
					 
					
						
						
							
							fix: fix types in example 06 ( #587 )  
						
						
						
					 
					
						2022-07-29 12:46:06 -04:00 
						 
				 
			
				
					
						
							
							
								dan_the_3rd 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							25ebf15d02 
							
						 
					 
					
						
						
							
							Ensure all arch::Mma specializations have ElementC set ( #576 )  
						
						... 
						
						
						
						Co-authored-by: danthe3rd <danthe3rd@users.noreply.github.com> 
						
					 
					
						2022-07-22 23:53:03 -04:00 
						 
				 
			
				
					
						
							
							
								Shang Zhang 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							5d05808072 
							
						 
					 
					
						
						
							
							fix gather example ( #574 )  
						
						
						
					 
					
						2022-07-19 16:18:17 -04:00 
						 
				 
			
				
					
						
							
							
								Ivan Komarov 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0b8cacd6f1 
							
						 
					 
					
						
						
							
							Remove redundant <fstream> includes ( #563 )  
						
						... 
						
						
						
						* Remove redundant <fstream> includes
* Fix fstream in examples/
* Fix <fstream> in test/
* Use consistent order for <fstream> (always after <iostream>)
* Remove an unneeded include in a file where std::ofstream usage is commented out
Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru> 
						
					 
					
						2022-07-19 15:23:54 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e7a61c761a 
							
						 
					 
					
						
						
							
							fix race condition when h < stride_h or w < stride_w ( #562 )  
						
						... 
						
						
						
						Co-authored-by: Haicheng Wu <haichengw@nvidia.com> 
						
					 
					
						2022-07-12 16:37:08 -04:00 
						 
				 
			
				
					
						
							
							
								seventh 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fb379eaa5b 
							
						 
					 
					
						
						
							
							epilogue leaky relu support ScaleType ( #564 )  
						
						... 
						
						
						
						Co-authored-by: xuweiqi <xuweiqi117@gmail.com> 
						
					 
					
						2022-07-11 17:30:55 -04:00 
						 
				 
			
				
					
						
							
							
								Jacob He 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8a766804ad 
							
						 
					 
					
						
						
							
							Fix doc in testbed_gemm_with_broadcast ( #559 )  
						
						
						
					 
					
						2022-07-07 09:56:16 -04:00 
						 
				 
			
				
					
						
							
							
								Bing Xu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1eb6355182 
							
						 
					 
					
						
						
							
							[activation] tanh ( #550 )  
						
						... 
						
						
						
						Co-authored-by: Bing Xu <bingxu@fb.com> 
						
					 
					
						2022-07-02 08:00:45 -04:00 
						 
				 
			
				
					
						
							
							
								Yujia Zhai 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							04a9777b87 
							
						 
					 
					
						
						
							
							Softmax ( #546 )  
						
						... 
						
						
						
						* add test layernorm g-mem version
* Delete include/configure directory
* Delete examples/test_layernorm directory
* Update gemm_with_softmax.h
* Update gemm_softmax.cu
* Update linear_combination.h
* Update fast_math.h
* remove redundant vars
Co-authored-by: yujia.zhai <yujia.zhai@bytedance.com>
Co-authored-by: yuzhai <yuzhai@nvidia.com> 
						
					 
					
						2022-07-02 01:19:18 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e45e773436 
							
						 
					 
					
						
						
							
							Update linear_combination_generic.h ( #472 )  
						
						... 
						
						
						
						add `skip_elementwise_` to support serial splitk in linear_combination_generic.h` 
						
					 
					
						2022-06-28 07:29:38 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							dae6b6893b 
							
						 
					 
					
						
						
							
							Update CHANGELOG.md  
						
						
						
					 
					
						2022-06-27 23:30:49 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ba18ea9c32 
							
						 
					 
					
						
						
							
							Update README.md  
						
						
						
					 
					
						2022-06-27 23:25:26 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							9ab9110168 
							
						 
					 
					
						
						
							
							add leaky relu ( #542 )  
						
						... 
						
						
						
						Authored-by: Haicheng Wu <haichengw@nvidia.com> 
						
					 
					
						2022-06-26 10:07:50 -04:00 
						 
				 
			
				
					
						
							
							
								Jinze (Richard) Xue 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							e5d4669f16 
							
						 
					 
					
						
						
							
							Update CHANGELOG.md ( #543 )  
						
						
						
					 
					
						2022-06-25 13:23:49 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							94f01f19d5 
							
						 
					 
					
						
						
							
							Add implicit gemm perf  
						
						... 
						
						
						
						plot from @manishucsd, presented in gtc'22 cutlass talk 
						
					 
					
						2022-06-23 22:47:11 -04:00 
						 
				 
			
				
					
						
							
							
								Jack Kosaian 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							fa56763c25 
							
						 
					 
					
						
						
							
							Fix occupancy calculation for grouped GEMM ( #532 )  
						
						
						
					 
					
						2022-06-18 19:53:59 -04:00 
						 
				 
			
				
					
						
							
							
								LiuWei 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							25e26a6e51 
							
						 
					 
					
						
						
							
							fix bugs in linear_combination_generic.h missing include cutlass/epilogue/thread/scale_type.h ( #531 )  
						
						
						
					 
					
						2022-06-17 23:35:14 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							f248e9bdb4 
							
						 
					 
					
						
						
							
							Create CITATION.cff  
						
						... 
						
						
						
						Add initial CITATION.cff 
						
					 
					
						2022-06-07 21:25:16 -04:00 
						 
				 
			
				
					
						
							
							
								Pei Sun 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							dceefe4f64 
							
						 
					 
					
						
						
							
							Increment stride correctly in warp iterator. ( #516 )  
						
						... 
						
						
						
						Co-authored-by: peisun1115 <peis@google.com> 
						
					 
					
						2022-06-06 12:33:36 -04:00 
						 
				 
			
				
					
						
							
							
								Pei Sun 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c3881d097e 
							
						 
					 
					
						
						
							
							Fix a comment about LDSM layout. ( #514 )  
						
						... 
						
						
						
						Co-authored-by: peisun1115 <peis@google.com> 
						
					 
					
						2022-06-04 23:04:00 -04:00 
						 
				 
			
				
					
						
							
							
								Pei Sun 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a29dfb1c63 
							
						 
					 
					
						
						
							
							Fix a bug to increment stride tile correctly ( #503 )  
						
						... 
						
						
						
						* Fix a bug to increment stride tile correctly
* Update regular_tile_access_iterator_tensor_op.h
Co-authored-by: peisun1115 <peis@google.com>
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com> 
						
					 
					
						2022-06-03 22:54:52 -04:00 
						 
				 
			
				
					
						
							
							
								Jack Kosaian 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0abaac84ea 
							
						 
					 
					
						
						
							
							[examples] Fix typos in SYRK and TRMM examples ( #507 )  
						
						
						
					 
					
						2022-06-03 22:52:41 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							858c735856 
							
						 
					 
					
						
						
							
							Update gather_scatter_fusion.cu  
						
						... 
						
						
						
						Correct the reference code in gather/scatter example to put bias add in the correct place. 
						
					 
					
						2022-05-18 13:15:25 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							d6f58b2d14 
							
						 
					 
					
						
						
							
							Update functionality.md  
						
						
						
					 
					
						2022-05-11 09:34:24 -04:00 
						 
				 
			
				
					
						
							
							
								Mike Iovine 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							c4cf0dad82 
							
						 
					 
					
						
						
							
							Fix init-self compiler warnings ( #493 )  
						
						... 
						
						
						
						Fix a few errors caused by trying to initialize a class member
with itself. These errors can turn into errors if you compile
with `-Winit-self`. 
						
					 
					
						2022-05-11 00:35:28 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							57551902d0 
							
						 
					 
					
						
						
							
							Update functionality.md  
						
						... 
						
						
						
						add some explanations to the functionality table. 
						
					 
					
						2022-05-11 00:01:19 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							1604ebaf10 
							
						 
					 
					
						
						
							
							Update generator.py  
						
						... 
						
						
						
						stop generating analytical conv kernels to reduce kernel number 
						
					 
					
						2022-05-08 21:47:15 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							6023038bae 
							
						 
					 
					
						
						
							
							add verification of the reduction tensor ( #489 )  
						
						... 
						
						
						
						Co-authored-by: Haicheng Wu <haichengw@nvidia.com> 
						
					 
					
						2022-05-06 10:24:51 -07:00 
						 
				 
			
				
					
						
							
							
								TonyZhao 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ddd8f9cf41 
							
						 
					 
					
						
						
							
							update float < int32_t * 4 ( #488 )  
						
						... 
						
						
						
						Co-authored-by: 赵俊涛 <zhaojuntao@zhaojuntaos-MacBook-Pro.local> 
						
					 
					
						2022-05-04 13:36:05 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ec2b4fd85d 
							
						 
					 
					
						
						
							
							b2b bias vector support ( #482 )  
						
						... 
						
						
						
						* b2b bias vector support
* add files
Co-authored-by: Haicheng Wu <haichengw@nvidia.com> 
						
					 
					
						2022-04-30 04:16:15 -07:00 
						 
				 
			
				
					
						
							
							
								Stepan Tezyunichev 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							86ce09aed1 
							
						 
					 
					
						
						
							
							2.9 fixes for nvrtc ( #480 )  
						
						... 
						
						
						
						* Use platform::is_same instead of std::is_same
* Don't hide cuComplex include from nvrtc
* Typo fixed
* Remove comment rename 
						
					 
					
						2022-04-29 09:06:52 -04:00 
						 
				 
			
				
					
						
							
							
								Haicheng Wu 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							21c1fa3849 
							
						 
					 
					
						
						
							
							add .github ( #479 )  
						
						... 
						
						
						
						Co-authored-by: Haicheng Wu <haichengw@nvidia.com> 
						
					 
					
						2022-04-28 12:36:59 -07:00