cutlass

Author	SHA1	Message	Date
Andrew Kerr	68aaee8773	Merge pull request #9 from NVIDIA/cutlass_v1.0_rel Updated URL to Doxygen and modified usage statement	2018-05-17 11:12:37 -07:00
akerr	acb90e962a	Updated url to Doxygen and modified usage statement in performance test program.	2018-05-17 11:11:05 -07:00
Andrew Kerr	96bc3f227f	Merge pull request #8 from NVIDIA/cutlass_v1.0_rel Configured Github Pages	2018-05-16 15:26:55 -07:00
akerr	25ff282403	Moved Doxygen documents.	2018-05-16 15:25:24 -07:00
Andrew Kerr	9d5726a568	Set theme jekyll-theme-minimal	2018-05-16 13:49:06 -07:00
Andrew Kerr	6f0d271d8d	CUTLASS v1.0 CUTLASS v1.0 released.	2018-05-16 13:47:13 -07:00
akerr	923dfb42ce	Updated README.md	2018-05-16 12:50:10 -07:00
akerr	6f6f269a0a	Updated README.md	2018-05-16 12:47:07 -07:00
akerr	2028ebe120	CUTLASS v1.0 release	2018-05-16 11:44:56 -07:00
Andrew Kerr	84377249a1	Merge pull request #2 from Artem-B/clang-fixes Merging "Clang fixes" into master.	2018-01-04 15:52:53 -08:00
akerr	901287175f	Merge branch 'Artem-B-clang-fixes'	2018-01-04 15:46:08 -08:00
Artem Belevich	1c9b54df16	Whitespace fix.	2018-01-03 16:42:51 -08:00
Artem Belevich	39616514d0	Reworked CUDA_LOG macro to print location&the message with one printf. This replies on the fact that clang allows using device-side features from __host__/__device__ functions from __host__ ones as long as we don't have to generate code for that. Wrapping thread/blockIdx in __host__ __device__ function allows using CUDA_LOG everywhere during host and device compilation.	2018-01-03 16:36:50 -08:00
Artem Belevich	df4b4e4bb6	Added _cuda_ to the name of the executable to indicate that it's not clang's version.	2017-12-11 16:34:10 -08:00
Artem Belevich	81957b3a3d	Force inlining of few functions that rely on that for performance. Clang is less agressive than nvccnvcc, so number of functions did not getn inlined into the kernel by default. That prevented SROA from eliminating loads/stores to temporary buffers and resulted in abysmal performance. Replaced inline with __forceinline__ to ensure that we do inline the functions necessary for optimal performance.	2017-12-11 14:52:30 -08:00
Artem Belevich	ce2b3f695d	Fixed debug macros for clang. Unlike nvcc, clang always sees both host and device-side code during compilation. CUDA_LOG macro is used in both host and device code, so when it expanded to contain device-only code, that resulted in errors when it was used from the host-side functions. In order to make CUDA_LOG work with clang it was split into two parts -- a pair of target-attribute-based overloaded functions that perform host or device specific parts of logging, and a printf which works on both sides.	2017-12-11 14:52:30 -08:00
Artem Belevich	e9e7cd4d44	Make cutlass compilable with clang. E.g: PATH=/nvcc/path/bin:/clang/path/bin:$PATH make sm=35,60 compiler=clang all	2017-12-11 14:52:30 -08:00
Duane Merrill	95b0578d34	Update license info	2017-12-06 10:00:59 -05:00
Duane Merrill	f4b48c7669	Update README.md	2017-12-05 22:58:46 -05:00
Duane Merrill	6cb88d53eb	Update README.md	2017-12-05 22:58:12 -05:00
Duane Merrill	537a4bcedf	Update README.md	2017-12-05 22:54:49 -05:00
Duane Merrill	5bd3f09312	Update README.md	2017-12-05 22:53:11 -05:00
Duane Merrill	6f091f5620	Update README.md	2017-12-05 22:44:01 -05:00
dumerrill	0428c89fd5	Updating readme with relative per chart	2017-12-05 22:40:47 -05:00
Duane Merrill	e2bf51c3fe	Update README.md	2017-12-05 22:25:42 -05:00
Duane Merrill	57747e382e	Update README.md	2017-12-05 21:32:06 -05:00
Duane Merrill	dd4dd4cebf	Update README.md	2017-12-05 20:58:01 -05:00
Duane Merrill	6565b48747	Update README.md	2017-12-05 20:56:49 -05:00
Duane Merrill	73211bbb88	Update README.md	2017-12-05 20:55:54 -05:00
Duane Merrill	9dcb2b4c7d	Update README.md	2017-12-05 20:55:03 -05:00
Duane Merrill	f30abfc00a	Update README.md	2017-12-05 20:50:15 -05:00
dumerrill	8ebd6b06d0	Replace svg with png+text	2017-12-05 20:20:25 -05:00
dumerrill	04ffa156e8	Adding figure to readme.md	2017-12-05 20:15:33 -05:00
Duane Merrill	24d0ba65c5	Update code formatting	2017-12-05 15:51:01 -05:00
akerr	4276e46e61	Improved formatting of Makefile	2017-12-05 12:45:06 -08:00
akerr	d08ba8ac46	Committing CUTLASS for release.	2017-12-04 21:12:52 -08:00
akerr	bbb3178126	Initial commit	2017-12-04 08:07:48 -08:00
Andrew Kerr	8f5033f371	Initial commit	2017-11-29 16:11:25 -08:00

... 6 7 8 9 10

488 Commits