Andrew Kerr
b5cab177a9
Performance enhancement for Volta Tensor Cores TN layout ( #53 )
...
* Fixed performance defect with indirect access to pointer array for Volta TensorCores TN arrangement.
* Updated patch version and changelog.
* Updated patch version and changelog.
* Added link to changelog in readme.
* Fixed markdown link
2019-07-10 10:54:12 -07:00
Artem Belevich
fb8b3a98b7
Addressed code review comments.
2019-05-10 10:24:52 -07:00
Artem Belevich
e18292db46
Make CUTLASS compileable with Clang.
...
Requires a recent clang build (r359248 or newer).
Enable compilation with clang with these options:
cmake -DCUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=/path/to/clang++
2019-05-02 11:00:22 -07:00
Timmy
fe3438a3c1
cutlass 1.3.1 ( #46 )
...
CUTLASS 1.3.1 patch resolves failing text with NVRTC.
2019-04-19 16:54:52 -07:00
Andrew Kerr
877bdcace6
Cutlass 1.3 Release ( #42 )
...
CUTLASS 1.3 Release
- Efficient GEMM kernel targeting Volta Tensor Cores via mma.sync instruction added in CUDA 10.1.
2019-03-20 10:49:17 -07:00
akerr
74df0331f2
CUTLASS 1.2
2018-10-26 14:38:46 -07:00
akerr
0826572c4c
Reduced range of random values to avoid bit-level inconsistencies for large matrices.
2018-09-19 21:11:48 -07:00
akerr
77d1e0ca81
Updated README and CHANGELOG.
2018-09-19 20:42:51 -07:00
akerr
461f417b9d
Checkpointing CUTLASS 1.1 release.
2018-09-18 16:58:03 -07:00
akerr
374882be53
Replaced GoogleTest copy with submodule. Added updates to support intra-threadblock reductions. Added tests for same.
2018-06-11 11:47:15 -07:00
akerr
2c496c3e9e
Replaced GoogleTest copy with Git submodule.
2018-06-11 11:32:41 -07:00
akerr
480732c2e8
Minor updates to usage and readme.
2018-05-17 15:10:55 -07:00
akerr
acb90e962a
Updated url to Doxygen and modified usage statement in performance test program.
2018-05-17 11:11:05 -07:00
akerr
2028ebe120
CUTLASS v1.0 release
2018-05-16 11:44:56 -07:00