cutlass

Author	SHA1	Message	Date
Manish Gupta	ccb697bac7	cutlass 2.4 documentation only update	2020-11-23 06:59:45 -06:00
Yang Wang	e6bcdc60cf	fix broken links (#148 )	2020-11-19 21:46:54 -08:00
Manish Gupta	6615010cd0	CUTLASS 2.4 (Implicit GEMM convolution) (#147 ) CUTLASS 2.4 (Implicit GEMM Convolution) Co-authored-by: Manish Gupta <manigupta@nvidia.com>, Haicheng Wu <haichengw@nvidia.com>, Dustyn Blasig <dblasig@nvidia.com>, Andrew Kerr <akerr@nvidia.com>	2020-11-19 21:25:25 -08:00
Dustyn Blasig	c2b80ad4e4	Merge pull request #135 from NVIDIA/cutlass_2.3_final CUTLASS 2.3.0	2020-09-25 13:25:26 -05:00
akerr	37a8f9e598	CUTLASS 2.3.0 final.	2020-09-25 10:34:46 -07:00
Andrew Kerr	c53f3339bb	CUTLASS 2.3 initial commit (#134 ) CUTLASS 2.3 adds GEMMs targeting Sparse Tensor Cores on the NVIDIA Ampere Architecture, fast SGEMM, and small matrix classes, bug fixes, and performance enhancements.	2020-09-23 14:00:58 -07:00
hwu36	4dac7490e6	Typoes (#107 ) * Update splitk_gemm.cu * Update gemm_bias_relu.cu * Update mma_sm75.h	2020-07-13 14:25:52 -07:00
Andrew Kerr	fd7e058d0c	Added examples to enable the unity build (#102 ) * Updated documentation of fused GEMM example and removed UNITY BUILD batch size. The default batch size when unity build is enabled tends to be favorable.	2020-06-17 07:09:18 -07:00
Andrew Kerr	1ab1027954	Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (#100 ) - Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. - Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out - Added test_examples target to build and test all CUTLASS examples - Minor edits to documentation to point to GTC 2020 webinar	2020-06-15 10:47:01 -07:00
Andrew Kerr	86931fef85	CUTLASS 2.2 (#96 ) Adds support for NVIDIA Ampere Architecture features. CUDA 11 Toolkit recommended.	2020-06-08 16:17:35 -07:00
Vijay Thakkar	e33d90b361	update tools/library/CMakeLists to require python 3.6 according to #70 (#82 ) #70 only updates the documentation. This commit reflects this bump in python version to the CMake configuration as well.	2020-04-08 10:54:36 -07:00
Andrew Kerr	96dab34ad9	CUTLASS 2.1 (#83 ) CUTLASS 2.1 contributes: - BLAS-style host-side API added to CUTLASS Library - Planar Complex GEMM kernels targeting Volta and Turing Tensor Cores - Minor enhancements and bug fixes	2020-04-07 13:51:25 -07:00
Andrew Kerr	7c0cd26d13	Need Python 3.6 to use enum.auto() (#70 )	2019-11-22 09:39:12 -08:00
Andrew Kerr	45ecbc885b	Removed redundant conjugation operations from matrix_traits. (#65 )	2019-11-20 11:27:13 -08:00
Andrew Kerr	8aca98f9a7	Improved formatting, clarity, and content of several documents. (#64 ) * Improved formatting, clarity, and content of several documents.	2019-11-20 10:42:15 -08:00
Dustyn Blasig	f4d9c8f755	Clang GPU compilation requires explicit CUDACC version flags (#63 )	2019-11-20 09:52:11 -08:00
Andrew Kerr	fb335f6a5f	CUTLASS 2.0 (#62 ) CUTLASS 2.0 Substantially refactored for - Better performance, particularly for native Turing Tensor Cores - Robust and durable templates spanning the design space - Encapsulated functionality embodying modern C++11 programming techniques - Optimized containers and data types for efficient, generic, portable device code Updates to: - Quick start guide - Documentation - Utilities - CUTLASS Profiler Native Turing Tensor Cores - Efficient GEMM kernels targeting Turing Tensor Cores - Mixed-precision floating point, 8-bit integer, 4-bit integer, and binarized operands Coverage of existing CUTLASS functionality: - GEMM kernels targeting CUDA and Tensor Cores in NVIDIA GPUs - Volta Tensor Cores through native mma.sync and through WMMA API - Optimizations such as parallel reductions, threadblock rasterization, and intra-threadblock reductions - Batched GEMM operations - Complex-valued GEMMs Note: this commit and all that follow require a host compiler supporting C++11 or greater.	2019-11-19 16:55:34 -08:00
Andrew Kerr	b5cab177a9	Performance enhancement for Volta Tensor Cores TN layout (#53 ) * Fixed performance defect with indirect access to pointer array for Volta TensorCores TN arrangement. * Updated patch version and changelog. * Updated patch version and changelog. * Added link to changelog in readme. * Fixed markdown link	2019-07-10 10:54:12 -07:00
Timmy	eb41735933	Merge pull request #47 from Artem-B/cutlass-1.3-clang Make CUTLASS compileable with Clang.	2019-05-13 10:52:45 -07:00
Artem Belevich	fb8b3a98b7	Addressed code review comments.	2019-05-10 10:24:52 -07:00
gthomascollignon	d9d357877f	Added missing file (#48 )	2019-05-09 14:07:52 -07:00
Artem Belevich	e18292db46	Make CUTLASS compileable with Clang. Requires a recent clang build (r359248 or newer). Enable compilation with clang with these options: cmake -DCUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=/path/to/clang++	2019-05-02 11:00:22 -07:00
Timmy	fe3438a3c1	cutlass 1.3.1 (#46 ) CUTLASS 1.3.1 patch resolves failing text with NVRTC.	2019-04-19 16:54:52 -07:00
Andrew Kerr	877bdcace6	Cutlass 1.3 Release (#42 ) CUTLASS 1.3 Release - Efficient GEMM kernel targeting Volta Tensor Cores via mma.sync instruction added in CUDA 10.1.	2019-03-20 10:49:17 -07:00
Andrew Kerr	19a9d64e3c	Removed patch version from README. Removed patch version from README.	2018-12-19 15:20:43 -08:00
Andrew Kerr	80e6f7c860	Merge pull request #38 from NVIDIA/resolve_maxwell Resolved issue for incorrect SGEMM on Maxwell architecture.	2018-12-19 15:17:41 -08:00
akerr	822b0952cd	Resolved issue for incorrect SGEMM on Maxwell architecture.	2018-12-19 15:07:16 -08:00
Andrew Kerr	ed2ed4d667	Merge pull request #33 from NVIDIA/cutlass_1.2 CUTLASS 1.2	2018-10-26 14:59:50 -07:00
Andrew Kerr	4db423c40f	Minor edit to CHANGELOG.	2018-10-26 14:58:31 -07:00
Andrew Kerr	b2bc0d3b79	Updating Doxygen docs	2018-10-26 14:54:58 -07:00
akerr	74df0331f2	CUTLASS 1.2	2018-10-26 14:38:46 -07:00
Andrew Kerr	2332df492e	Merge pull request #30 from NVIDIA/fix_utilities_example Fixed cutlass_utilities example.	2018-09-29 15:09:18 -07:00
akerr	cfe4b933ef	CUDA 9 lacks host-side conversions from float=>half. Instead, we must reinterpret_cast<> from cutlass::half_t => half.	2018-09-29 15:04:20 -07:00
Andrew Kerr	6877595a5e	Merge pull request #28 from NVIDIA/cutlass_1.1 Fixed typeo	2018-09-28 12:59:49 -07:00
Andrew Kerr	69e3709da4	Fixed typeo Fixed typeo	2018-09-28 12:59:20 -07:00
Andrew Kerr	d419094c28	Merge pull request #26 from NVIDIA/cutlass_1.1 Clarification to README	2018-09-21 11:44:47 -07:00
akerr	1a7ac522f8	Clarification to README	2018-09-20 11:04:03 -07:00
Andrew Kerr	bf6eec53eb	Merge pull request #25 from NVIDIA/cutlass_1.1 Updated CUTLASS.md	2018-09-19 21:33:04 -07:00
akerr	206e38dac5	Updated copyright of CUTLASS.md	2018-09-19 21:31:12 -07:00
Andrew Kerr	d85f6a1cec	Merge pull request #24 from NVIDIA/cutlass_1.1 Cutlass 1.1	2018-09-19 21:16:53 -07:00
akerr	0826572c4c	Reduced range of random values to avoid bit-level inconsistencies for large matrices.	2018-09-19 21:11:48 -07:00
akerr	77d1e0ca81	Updated README and CHANGELOG.	2018-09-19 20:42:51 -07:00
akerr	d7137f9c0a	Updated doxygen	2018-09-19 14:02:08 -07:00
akerr	461f417b9d	Checkpointing CUTLASS 1.1 release.	2018-09-18 16:58:03 -07:00
Andrew Kerr	cf0301e00f	Merge pull request #15 from NVIDIA/release_1.0.1_edits Minor edits to README and changelog pursuant CUTLASS 1.0.1 patch.	2018-06-26 13:59:01 -07:00
akerr	b9bb0d1a49	Edits to README and changelog pursuant CUTLASS 1.0.1 patch.	2018-06-26 13:57:39 -07:00
Andrew Kerr	e1c4ba501b	Merge pull request #13 from NVIDIA/cutlass_v1.0.1 Cutlass v1.0.1	2018-06-12 08:25:56 -07:00
akerr	c566e83e6d	Updated changelog.	2018-06-11 14:54:07 -07:00
akerr	374882be53	Replaced GoogleTest copy with submodule. Added updates to support intra-threadblock reductions. Added tests for same.	2018-06-11 11:47:15 -07:00
akerr	2c496c3e9e	Replaced GoogleTest copy with Git submodule.	2018-06-11 11:32:41 -07:00

1 2

90 Commits