cutlass

Author	SHA1	Message	Date
Artem Belevich	df4b4e4bb6	Added _cuda_ to the name of the executable to indicate that it's not clang's version.	2017-12-11 16:34:10 -08:00
Artem Belevich	81957b3a3d	Force inlining of few functions that rely on that for performance. Clang is less agressive than nvccnvcc, so number of functions did not getn inlined into the kernel by default. That prevented SROA from eliminating loads/stores to temporary buffers and resulted in abysmal performance. Replaced inline with __forceinline__ to ensure that we do inline the functions necessary for optimal performance.	2017-12-11 14:52:30 -08:00
Artem Belevich	ce2b3f695d	Fixed debug macros for clang. Unlike nvcc, clang always sees both host and device-side code during compilation. CUDA_LOG macro is used in both host and device code, so when it expanded to contain device-only code, that resulted in errors when it was used from the host-side functions. In order to make CUDA_LOG work with clang it was split into two parts -- a pair of target-attribute-based overloaded functions that perform host or device specific parts of logging, and a printf which works on both sides.	2017-12-11 14:52:30 -08:00
Artem Belevich	e9e7cd4d44	Make cutlass compilable with clang. E.g: PATH=/nvcc/path/bin:/clang/path/bin:$PATH make sm=35,60 compiler=clang all	2017-12-11 14:52:30 -08:00
Duane Merrill	95b0578d34	Update license info	2017-12-06 10:00:59 -05:00
Duane Merrill	f4b48c7669	Update README.md	2017-12-05 22:58:46 -05:00
Duane Merrill	6cb88d53eb	Update README.md	2017-12-05 22:58:12 -05:00
Duane Merrill	537a4bcedf	Update README.md	2017-12-05 22:54:49 -05:00
Duane Merrill	5bd3f09312	Update README.md	2017-12-05 22:53:11 -05:00
Duane Merrill	6f091f5620	Update README.md	2017-12-05 22:44:01 -05:00
dumerrill	0428c89fd5	Updating readme with relative per chart	2017-12-05 22:40:47 -05:00
Duane Merrill	e2bf51c3fe	Update README.md	2017-12-05 22:25:42 -05:00
Duane Merrill	57747e382e	Update README.md	2017-12-05 21:32:06 -05:00
Duane Merrill	dd4dd4cebf	Update README.md	2017-12-05 20:58:01 -05:00
Duane Merrill	6565b48747	Update README.md	2017-12-05 20:56:49 -05:00
Duane Merrill	73211bbb88	Update README.md	2017-12-05 20:55:54 -05:00
Duane Merrill	9dcb2b4c7d	Update README.md	2017-12-05 20:55:03 -05:00
Duane Merrill	f30abfc00a	Update README.md	2017-12-05 20:50:15 -05:00
dumerrill	8ebd6b06d0	Replace svg with png+text	2017-12-05 20:20:25 -05:00
dumerrill	04ffa156e8	Adding figure to readme.md	2017-12-05 20:15:33 -05:00
Duane Merrill	24d0ba65c5	Update code formatting	2017-12-05 15:51:01 -05:00
akerr	4276e46e61	Improved formatting of Makefile	2017-12-05 12:45:06 -08:00
akerr	d08ba8ac46	Committing CUTLASS for release.	2017-12-04 21:12:52 -08:00
akerr	bbb3178126	Initial commit	2017-12-04 08:07:48 -08:00
Andrew Kerr	8f5033f371	Initial commit	2017-11-29 16:11:25 -08:00

1 2

75 Commits