cutlass

Author	SHA1	Message	Date
Vijay Thakkar	7d49e6c7e2	Updates for CUTLASS 3.5.0 (#1468 )	2024-04-11 21:33:40 -04:00
Vijay Thakkar	629f4653c3	CUTLASS 3.5.0 (#1411 )	2024-03-19 17:51:04 -04:00
ANIKET SHIVAM	bbe579a9e3	Updates for CUTLASS 3.4.1 (#1346 ) * Updates for CUTLASS 3.4.1 * minor epi change	2024-02-15 15:48:34 -05:00
Aleksandar Samardžić	ca37d632c9	Remove sparse GEMM with row broadcasted bias vector (#1302 ) This reverts commit `d3e72719b4`. Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs>	2024-01-17 14:06:27 -05:00
ANIKET SHIVAM	751eb9a885	Update license year (#1306 )	2024-01-16 14:37:22 -05:00
ANIKET SHIVAM	2f589ffa76	Updates for 3.4 release. (#1305 )	2024-01-16 13:42:51 -05:00
Aleksandar Samardžić	5c756eb774	Add support for sparse GEMM with visitor epilogue (#1189 ) * Add support for sparse GEMM with visitor epilogue * Refactor changes at the kernel level	2024-01-04 12:38:11 -05:00
Pradeep Ramani	8236f30675	CUTLASS 3.4.0 (#1286 ) * CUTLASS 3.4.0 * Update CHANGELOG.md --------- Co-authored-by: Pradeep Ramani <prramani@nvidia.com>	2023-12-29 15:21:31 -05:00
Christian Sigg	e1483d5fa0	Collection of changes to fix clang build. (#1200 ) * Remove unused variables * Qualify calls to make_fragment_? from templated base class. Fixes clang build error. * Add missing `#include <cstdio>` * Various changes to fix clang compile errors. * More changes to fix clang build. Remaining issues: - `params` initializer of `CollectiveEpilogue`. - `ops` initializer of `Sm90VisitorImplBase`. - `__usAtomicCAS` needs to be added to clang upstream. * Fix remaining clang build issues. * Qualify `cute::rank()` calls. * Qualify some more calls that are otherwise ambiguous between `cute` and `std` namespace. * Double-escape special registers in inline asm. * small change --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2023-12-08 14:42:12 -05:00
Ali Hassani	f4a0216601	Fix bug in single source GEMM with residual + streamk (#1249 ) Followup to #1224. A change in the stream-k threadblock swizzle ctor since 3.3 breaks single source GEMM with fused epilogue and stream-k. Multi-source was already corrected. Co-authored-by: Ali Hassani <ahassanijr@gmail.com>	2023-12-07 11:12:02 -05:00
Pradeep Ramani	e9e30c2304	Updates and Bug fixes to CUTLASS 3.3 (#1232 )	2023-12-05 09:50:49 -05:00
Christian Sigg	99c4eebe3b	Explicitly cast `blockIdx` to `uint3` (#1192 ) This works around a clang issue where blockIdx is of a different type.	2023-11-30 00:26:23 -05:00
Sergey Klevtsov	b5d8a5d9cc	Allow SM90 pingpong kernel to use custom tile schedulers (#1194 ) Co-authored-by: Sergey Klevtsov <sklevtsov@nvidia.com>	2023-11-15 13:45:17 -05:00
wang-y-z	557be3ab0e	Fix several typos (#1169 ) Co-authored-by: isaacw <isaacw@nvidia.com>	2023-11-02 23:54:46 -04:00
Pradeep Ramani	c008b4aea8	CUTLASS 3.3.0 (#1167 ) * Release 3.3.0 Adds support for mixed precision GEMMs On Hopper and Ampere Adds support for < 16B aligned GEMMs on Hopper Enhancements to EVT Enhancements to Python interface Enhancements to Sub-byte type handling in CuTe Several other bug-fixes and performance improvements. * minor doc update	2023-11-02 11:09:05 -04:00
ANIKET SHIVAM	90d3b0fb18	CUTLASS 3.2.1 (#1113 ) * Updates for 3.2.1 release. * Minor fix in gemm op profiler for raster order. * Add scheduler mapping for raster order in the kernels.	2023-09-26 17:24:26 -04:00
Driss Guessous	88c0d7c726	make only visible on device (#1071 )	2023-09-07 13:00:46 -04:00
ANIKET SHIVAM	a88c41cf8d	Updates for 3.2 release (#1065 )	2023-08-25 23:05:46 -04:00
Haibin Lin	7e5ee8b7bf	[doc] fix: fix typos in the comment (#1049 )	2023-08-16 11:39:25 -04:00
ANIKET SHIVAM	4575443d44	CUTLASS 3.2 (#1024 ) * CUTLASS 3.2	2023-08-07 20:50:32 -04:00
Aleksandar Samardžić	d3e72719b4	Add support for sparse GEMM with row broadcasted bias vector (#951 )	2023-05-24 10:25:05 -04:00
ANIKET SHIVAM	f079619f5e	More updates for 3.1 (#958 ) * Updates for 3.1 * Minor change * doc link fix * Minor updates	2023-05-24 10:17:16 -04:00
Ali Hassani	13f413493a	Stream-K with broadcast (#892 ) * [WIP] GEMM StreamK w/ Fused Epilogue * Adds Gemm Streamk with Fused Epilogue kernel level struct. * Mostly based on Gemm with Fused Epilogue, * Requires a new epilogue * Work in progress * [WIP] StreamK support for GemmUniversalWithBroadcast * Just based off of how StreamK is allowed in GemmUniversal * Untested and a work in progress * Minor fixes * [WIP] It compiles! It is almost certainly incorrect, but we're past getting the templates to match, so checkpointing. * Correction to reference kernel * Fix typo * Added MSE measurement * Switch back to reference kernel + host for loop Still WIP. Now we're getting even a larger MSE, but it's both on basic Split-K and Stream-K. * Fix typos * Fix broadcast vector + requested changes * Comment typo * Small int option and more * Fix incorrect condition on source needed * Requested changes * I think I got it? * Bias vector should be stride 0 * Two source added! * Typos * Merge examples * Bring back vector row offset Just to ensure consistency with universal gemm with fused epilogue * Base arguments and params structs for StreamK * StreamK epilogue with broadcast now inherits the original * undo params_streamk_base.h --------- Co-authored-by: Ali Hassani <ahassanijr@gmail.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2023-05-22 19:05:06 -04:00
ANIKET SHIVAM	7c04f95415	Updates for 3.1 (#932 )	2023-04-29 09:34:27 -04:00
ANIKET SHIVAM	d572cc1aab	CUTLASS 3.1 (#915 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2023-04-14 23:19:34 -04:00
Alexander Zinoviev	42290f5d1c	Fix for dangling pointers (#885 )	2023-03-25 01:15:14 -04:00
Jack Kosaian	6116706c96	Set batch_strides on Params::update (#883 )	2023-03-20 17:07:47 -04:00
Alexander Pivovarov	7e370c9637	Fix typos 2 (#842 ) Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2023-03-09 23:22:56 -05:00
Haicheng Wu	65688c2a87	streamk fix (#836 ) Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2023-02-23 16:35:08 -05:00
Haicheng Wu	91b8de8d32	streamk fix (#830 ) Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2023-02-20 11:03:16 -05:00
Shuai Shao	ce8597dc14	Fix type bug in conv2d/gemm with broadcast (#796 ) add ElementVector --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2023-02-09 20:53:25 -05:00
Vijay Thakkar	277bd6e537	CUTLASS 3.0.0 (#786 ) * CUTLASS 3.0.0	2023-01-23 20:55:28 -05:00
ANIKET SHIVAM	66d9cddc83	New updates for 2.11 (#775 ) * New updates. * Minor profiler updates Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2023-01-20 16:32:57 -05:00
Haicheng Wu	764b840d6f	streamk example and performance tuning (#760 ) * streamk example and performance tuning * one missing file Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2023-01-10 16:10:02 -05:00
Haicheng Wu	ff6e733fe1	restore the old epilogue for everything except streamk (#749 ) Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2023-01-04 11:02:55 -05:00
ANIKET SHIVAM	38193d76e3	Updates for stream-k (#728 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2022-12-08 23:48:10 -05:00
Mike Iovine	d6117ca362	Relax stream K gemm alignment constraints (#717 ) * Relax stream K gemm alignment constraints The current alignment requirements are too strict. Make them identical to the checks for the regular universal gemm. * Revert "Relax stream K gemm alignment constraints" This reverts commit 31e80a250e2b0ac4bda2e4b437b39dc5bcd5e845. * Relax stream K gemm alignment constraints The current alignment requirements are too strict. Make them identical to the checks for the regular universal gemm. Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-12-07 11:17:49 -05:00
Aditya Atluri	c975e2ccbb	releaase 2.11 (#703 )	2022-11-19 09:02:15 -05:00
Haicheng Wu	012c62c748	bug fixes and enharcement to gemm reductionK fusion (#682 ) * add two missing files * fix bunch of bugs of gemm-reducek fusion and add a device interface * small changes Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-11-03 11:07:50 -04:00
Ying Zhang	dadc881a96	Bug fix for gemm broadcast (#650 ) * gemm_universal_with_broadcast, +2 sources. * Revert "gemm_universal_with_broadcast, +2 sources." This reverts commit fb063251f2144a091f12c9abfce7e1713f2d1c9e. * gemm broadcast bug fix	2022-09-30 10:00:38 -04:00
Ying Zhang	a821280dc7	Gemm broadcast (#632 ) * gemm_universal_with_broadcast, +2 sources. * Revert "gemm_universal_with_broadcast, +2 sources." This reverts commit fb063251f2144a091f12c9abfce7e1713f2d1c9e. * gemm_universal_with_broadcast separated version. * Update copyright banner. * update banner	2022-09-20 10:37:12 -04:00
ANIKET SHIVAM	b72cbf957d	CUTLASS 2.10 (#615 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2022-09-03 18:48:46 -04:00
Mike Iovine	c4cf0dad82	Fix init-self compiler warnings (#493 ) Fix a few errors caused by trying to initialize a class member with itself. These errors can turn into errors if you compile with `-Winit-self`.	2022-05-11 00:35:28 -04:00
Stepan Tezyunichev	86ce09aed1	2.9 fixes for nvrtc (#480 ) * Use platform::is_same instead of std::is_same * Don't hide cuComplex include from nvrtc * Typo fixed * Remove comment rename	2022-04-29 09:06:52 -04:00
Andrew Kerr	12f4108ac2	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
Feng Shijie	dd571f0edb	[style] fix code indentation (#449 ) * [docs] fix typo in media/docs/layout.md * [docs] fix comment error * fix typo in include/cutlass/arch/simd_61.h * fix stride comment errors in TensorLayout * fix indentation	2022-04-03 21:13:17 -04:00
HouQiming	96a11a1ef3	Removed trivial copy constructors on parameter classes to enable devi… (#366 ) * Removed trivial copy constructors on parameter classes to enable device-side launch of CUTLASS kernels * Added SFINAE to the `TensorRef(NonConstTensorRef const&)` constructor to avoid making it a copy-constructor for device code * std => platform * fix affine2 * really fix affine2 Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-02-28 21:34:02 -05:00
Jongsoo Park	1db6971a8d	Remove unused gemm_k_iterations in GemmKernel::Params (#406 ) Otherwise we get gemm_k_iterations is uninitialized warnings.	2022-02-16 09:52:45 -05:00
Andrew Kerr	288af365db	Added missing synchronization to avoid WAR hazards between tiles. (#386 )	2021-12-20 08:34:08 -08:00
Manish Gupta	808c25337a	CUTLASS 2.8 (#363 ) CUTLASS 2.8	2021-11-19 13:26:35 -08:00

1 2

61 Commits