Commit Graph

81 Commits

Author SHA1 Message Date
akerr
6f6f269a0a Updated README.md 2018-05-16 12:47:07 -07:00
akerr
2028ebe120 CUTLASS v1.0 release 2018-05-16 11:44:56 -07:00
Andrew Kerr
84377249a1
Merge pull request #2 from Artem-B/clang-fixes
Merging "Clang fixes" into master.
2018-01-04 15:52:53 -08:00
akerr
901287175f Merge branch 'Artem-B-clang-fixes' 2018-01-04 15:46:08 -08:00
Artem Belevich
1c9b54df16 Whitespace fix. 2018-01-03 16:42:51 -08:00
Artem Belevich
39616514d0 Reworked CUDA_LOG macro to print location&the message with one printf.
This replies on the fact that clang allows using device-side features
from __host__/__device__ functions from __host__ ones as long as we
don't have to generate code for that. Wrapping thread/blockIdx in
__host__ __device__ function allows using CUDA_LOG everywhere during
host and device compilation.
2018-01-03 16:36:50 -08:00
Artem Belevich
df4b4e4bb6 Added _cuda_ to the name of the executable to indicate that it's not clang's version. 2017-12-11 16:34:10 -08:00
Artem Belevich
81957b3a3d Force inlining of few functions that rely on that for performance.
Clang is less agressive than nvccnvcc, so number of functions did not getn
inlined into the kernel by default. That prevented SROA from eliminating
loads/stores to temporary buffers and resulted in abysmal performance.

Replaced inline with __forceinline__ to ensure that we do inline the
functions necessary for optimal performance.
2017-12-11 14:52:30 -08:00
Artem Belevich
ce2b3f695d Fixed debug macros for clang.
Unlike nvcc, clang always sees both host and device-side code during
compilation. CUDA_LOG macro is used in both host and device code, so when it
expanded to contain device-only code, that resulted in errors when it was used
from the host-side functions.

In order to make CUDA_LOG work with clang it was split into two parts -- a pair
of target-attribute-based overloaded functions that perform host or device
specific parts of logging, and a printf which works on both sides.
2017-12-11 14:52:30 -08:00
Artem Belevich
e9e7cd4d44 Make cutlass compilable with clang.
E.g:
PATH=/nvcc/path/bin:/clang/path/bin:$PATH make sm=35,60 compiler=clang all
2017-12-11 14:52:30 -08:00
Duane Merrill
95b0578d34 Update license info 2017-12-06 10:00:59 -05:00
Duane Merrill
f4b48c7669
Update README.md 2017-12-05 22:58:46 -05:00
Duane Merrill
6cb88d53eb
Update README.md 2017-12-05 22:58:12 -05:00
Duane Merrill
537a4bcedf
Update README.md 2017-12-05 22:54:49 -05:00
Duane Merrill
5bd3f09312
Update README.md 2017-12-05 22:53:11 -05:00
Duane Merrill
6f091f5620
Update README.md 2017-12-05 22:44:01 -05:00
dumerrill
0428c89fd5 Updating readme with relative per chart 2017-12-05 22:40:47 -05:00
Duane Merrill
e2bf51c3fe
Update README.md 2017-12-05 22:25:42 -05:00
Duane Merrill
57747e382e
Update README.md 2017-12-05 21:32:06 -05:00
Duane Merrill
dd4dd4cebf
Update README.md 2017-12-05 20:58:01 -05:00
Duane Merrill
6565b48747
Update README.md 2017-12-05 20:56:49 -05:00
Duane Merrill
73211bbb88
Update README.md 2017-12-05 20:55:54 -05:00
Duane Merrill
9dcb2b4c7d
Update README.md 2017-12-05 20:55:03 -05:00
Duane Merrill
f30abfc00a
Update README.md 2017-12-05 20:50:15 -05:00
dumerrill
8ebd6b06d0 Replace svg with png+text 2017-12-05 20:20:25 -05:00
dumerrill
04ffa156e8 Adding figure to readme.md 2017-12-05 20:15:33 -05:00
Duane Merrill
24d0ba65c5
Update code formatting 2017-12-05 15:51:01 -05:00
akerr
4276e46e61 Improved formatting of Makefile 2017-12-05 12:45:06 -08:00
akerr
d08ba8ac46 Committing CUTLASS for release. 2017-12-04 21:12:52 -08:00
akerr
bbb3178126 Initial commit 2017-12-04 08:07:48 -08:00
Andrew Kerr
8f5033f371
Initial commit 2017-11-29 16:11:25 -08:00