cutlass

Author	SHA1	Message	Date
wang-y-z	557be3ab0e	Fix several typos (#1169 ) Co-authored-by: isaacw <isaacw@nvidia.com>	2023-11-02 23:54:46 -04:00
Pradeep Ramani	c008b4aea8	CUTLASS 3.3.0 (#1167 ) * Release 3.3.0 Adds support for mixed precision GEMMs On Hopper and Ampere Adds support for < 16B aligned GEMMs on Hopper Enhancements to EVT Enhancements to Python interface Enhancements to Sub-byte type handling in CuTe Several other bug-fixes and performance improvements. * minor doc update	2023-11-02 11:09:05 -04:00
milesvant	fb10fa5308	Fix broken pipeline link in docs (#1143 )	2023-10-18 12:55:46 -04:00
ANIKET SHIVAM	90d3b0fb18	CUTLASS 3.2.1 (#1113 ) * Updates for 3.2.1 release. * Minor fix in gemm op profiler for raster order. * Add scheduler mapping for raster order in the kernels.	2023-09-26 17:24:26 -04:00
lorenzo chelini	3930f709ce	Fix typo in `0x_gemm_tutorial.md` (#1035 )	2023-08-17 10:52:20 -04:00
ANIKET SHIVAM	4575443d44	CUTLASS 3.2 (#1024 ) * CUTLASS 3.2	2023-08-07 20:50:32 -04:00
Nathan Wang	9b923dd4c4	fix minor typos (#984 )	2023-07-05 09:23:01 -04:00
ANIKET SHIVAM	f079619f5e	More updates for 3.1 (#958 ) * Updates for 3.1 * Minor change * doc link fix * Minor updates	2023-05-24 10:17:16 -04:00
Haicheng Wu	6fbc0d3380	Update layout.md	2023-05-17 20:12:58 -04:00
Haicheng Wu	e2953d47c5	Update gemm_api.md	2023-05-12 15:37:31 -04:00
ANIKET SHIVAM	7c04f95415	Updates for 3.1 (#932 )	2023-04-29 09:34:27 -04:00
Adnan Akhundov	54bebe417d	Fix some typos in CuTe tutorials (#912 )	2023-04-17 16:00:51 -04:00
ANIKET SHIVAM	d572cc1aab	CUTLASS 3.1 (#915 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2023-04-14 23:19:34 -04:00
Adnios	0964bdb64c	update gemm and conv2d cmdline --help output (#878 )	2023-04-01 11:38:13 -04:00
Alexander Pivovarov	7e370c9637	Fix typos 2 (#842 ) Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2023-03-09 23:22:56 -05:00
ANIKET SHIVAM	c4f6b8c6bc	Updates for 3.0 (#857 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2023-03-09 15:27:40 -05:00
ZZK	a101ac283f	Fix some typos (#791 ) * fix typo * fix a deadlink to code	2023-02-16 15:56:55 -05:00
Vijay Thakkar	277bd6e537	CUTLASS 3.0.0 (#786 ) * CUTLASS 3.0.0	2023-01-23 20:55:28 -05:00
ANIKET SHIVAM	66d9cddc83	New updates for 2.11 (#775 ) * New updates. * Minor profiler updates Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2023-01-20 16:32:57 -05:00
tpoisonooo	8567b87d65	Update quickstart.md (#704 ) * Update quickstart.md * Update doxygen_mainpage.md * Update doxygen_mainpage.md * Update terminology.md	2022-11-29 21:43:03 -05:00
Aditya Atluri	c975e2ccbb	releaase 2.11 (#703 )	2022-11-19 09:02:15 -05:00
FZC	cc85b64cf6	fix typo (#677 )	2022-11-01 14:07:33 -04:00
ANIKET SHIVAM	b72cbf957d	CUTLASS 2.10 (#615 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2022-09-03 18:48:46 -04:00
Cliff Burdick	536b20763e	Fixed typo in profiler README (#603 )	2022-08-24 21:55:13 -04:00
Haicheng Wu	d6f58b2d14	Update functionality.md	2022-05-11 09:34:24 -04:00
Haicheng Wu	57551902d0	Update functionality.md add some explanations to the functionality table.	2022-05-11 00:01:19 -04:00
Masahiro Masuda	70f3ba57f5	Fix typo in shared memory layout description (#471 )	2022-04-24 18:32:13 -04:00
Andrew Kerr	12f4108ac2	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
Feng Shijie	cd39c75e25	Fix typo in docs, code comments (#429 ) * [docs] fix typo in media/docs/layout.md * [docs] fix comment error * fix typo in include/cutlass/arch/simd_61.h * fix stride comment errors in TensorLayout	2022-03-15 21:54:36 -04:00
Ivan Komarov	e96f00586c	Make cutlass::gemm::device::GemmArray usable (#295 ) * Fix the build of cutlass/gemm/device/gemm_array.h and add a demo for GemmArray * Add a reference to GemmArray to the docs Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru>	2022-02-17 20:01:05 -05:00
Manish Gupta	808c25337a	CUTLASS 2.8 (#363 ) CUTLASS 2.8	2021-11-19 13:26:35 -08:00
Haicheng Wu	6fc5008803	Update quickstart.md fix a broken link	2021-11-11 09:53:46 -05:00
Manish Gupta	6c2f8f2fb8	CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning * cutlass 2.6 update * remove debug prints * cutlass 2.6.1 (minor update) * Updated CHANGELOG. * Minor edit to readme to indicate patch version. * Minor edit to readme. Co-authored-by: Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com>	2021-09-03 10:26:15 -07:00
dongxiao	d36f331b44	fix typo in doc fix typo	2021-08-08 16:44:22 +08:00
Haicheng Wu	10709dbb64	clean profiler cmd and doc	2021-07-30 11:02:17 -07:00
Peter Han	64dd1e1915	Doc typo Signed-off-by: Peter Han <fujun.han@iluvatar.ai>	2021-07-29 08:45:59 +08:00
Manish Gupta	1ac4559d12	Cutlass 2.6 Update 1 (#301 ) * cutlass 2.6 update * remove debug prints	2021-07-27 17:58:30 -07:00
Manish Gupta	e5d51840e8	CUTLASS 2.6 (#298 ) CUTLASS 2.6	2021-07-23 00:40:53 -04:00
Zheng Zeng	b878c96421	Fixes some typos in utilities.md	2021-05-06 22:37:37 +08:00
Andrew Kerr	0e13748649	CUTLASS 2.5	2021-02-26 09:58:26 -05:00
Manish Gupta	ccb697bac7	cutlass 2.4 documentation only update	2020-11-23 06:59:45 -06:00
Yang Wang	e6bcdc60cf	fix broken links (#148 )	2020-11-19 21:46:54 -08:00
Manish Gupta	6615010cd0	CUTLASS 2.4 (Implicit GEMM convolution) (#147 ) CUTLASS 2.4 (Implicit GEMM Convolution) Co-authored-by: Manish Gupta <manigupta@nvidia.com>, Haicheng Wu <haichengw@nvidia.com>, Dustyn Blasig <dblasig@nvidia.com>, Andrew Kerr <akerr@nvidia.com>	2020-11-19 21:25:25 -08:00
Andrew Kerr	c53f3339bb	CUTLASS 2.3 initial commit (#134 ) CUTLASS 2.3 adds GEMMs targeting Sparse Tensor Cores on the NVIDIA Ampere Architecture, fast SGEMM, and small matrix classes, bug fixes, and performance enhancements.	2020-09-23 14:00:58 -07:00
Andrew Kerr	fd7e058d0c	Added examples to enable the unity build (#102 ) * Updated documentation of fused GEMM example and removed UNITY BUILD batch size. The default batch size when unity build is enabled tends to be favorable.	2020-06-17 07:09:18 -07:00
Andrew Kerr	1ab1027954	Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (#100 ) - Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. - Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out - Added test_examples target to build and test all CUTLASS examples - Minor edits to documentation to point to GTC 2020 webinar	2020-06-15 10:47:01 -07:00
Andrew Kerr	86931fef85	CUTLASS 2.2 (#96 ) Adds support for NVIDIA Ampere Architecture features. CUDA 11 Toolkit recommended.	2020-06-08 16:17:35 -07:00
Andrew Kerr	96dab34ad9	CUTLASS 2.1 (#83 ) CUTLASS 2.1 contributes: - BLAS-style host-side API added to CUTLASS Library - Planar Complex GEMM kernels targeting Volta and Turing Tensor Cores - Minor enhancements and bug fixes	2020-04-07 13:51:25 -07:00
Andrew Kerr	7c0cd26d13	Need Python 3.6 to use enum.auto() (#70 )	2019-11-22 09:39:12 -08:00
Andrew Kerr	8aca98f9a7	Improved formatting, clarity, and content of several documents. (#64 ) * Improved formatting, clarity, and content of several documents.	2019-11-20 10:42:15 -08:00

1 2

51 Commits