* Fixed performance defect with indirect access to pointer array for Volta TensorCores TN arrangement.
* Updated patch version and changelog.
* Updated patch version and changelog.
* Added link to changelog in readme.
* Fixed markdown link
Requires a recent clang build (r359248 or newer).
Enable compilation with clang with these options:
cmake -DCUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=/path/to/clang++