cutlass

Author	SHA1	Message	Date
Ivan Komarov	0b8cacd6f1	Remove redundant <fstream> includes (#563 ) * Remove redundant <fstream> includes * Fix fstream in examples/ * Fix <fstream> in test/ * Use consistent order for <fstream> (always after <iostream>) * Remove an unneeded include in a file where std::ofstream usage is commented out Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru>	2022-07-19 15:23:54 -04:00
Haicheng Wu	e7a61c761a	fix race condition when h < stride_h or w < stride_w (#562 ) Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-07-12 16:37:08 -04:00
seventh	fb379eaa5b	epilogue leaky relu support ScaleType (#564 ) Co-authored-by: xuweiqi <xuweiqi117@gmail.com>	2022-07-11 17:30:55 -04:00
Jacob He	8a766804ad	Fix doc in testbed_gemm_with_broadcast (#559 )	2022-07-07 09:56:16 -04:00
Bing Xu	1eb6355182	[activation] tanh (#550 ) Co-authored-by: Bing Xu <bingxu@fb.com>	2022-07-02 08:00:45 -04:00
Yujia Zhai	04a9777b87	Softmax (#546 ) * add test layernorm g-mem version * Delete include/configure directory * Delete examples/test_layernorm directory * Update gemm_with_softmax.h * Update gemm_softmax.cu * Update linear_combination.h * Update fast_math.h * remove redundant vars Co-authored-by: yujia.zhai <yujia.zhai@bytedance.com> Co-authored-by: yuzhai <yuzhai@nvidia.com>	2022-07-02 01:19:18 -04:00
Haicheng Wu	e45e773436	Update linear_combination_generic.h (#472 ) add `skip_elementwise_` to support serial splitk in linear_combination_generic.h`	2022-06-28 07:29:38 -04:00
Haicheng Wu	dae6b6893b	Update CHANGELOG.md	2022-06-27 23:30:49 -04:00
Haicheng Wu	ba18ea9c32	Update README.md	2022-06-27 23:25:26 -04:00
Haicheng Wu	9ab9110168	add leaky relu (#542 ) Authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-06-26 10:07:50 -04:00
Jinze (Richard) Xue	e5d4669f16	Update CHANGELOG.md (#543 )	2022-06-25 13:23:49 -04:00
Haicheng Wu	94f01f19d5	Add implicit gemm perf plot from @manishucsd, presented in gtc'22 cutlass talk	2022-06-23 22:47:11 -04:00
Jack Kosaian	fa56763c25	Fix occupancy calculation for grouped GEMM (#532 )	2022-06-18 19:53:59 -04:00
LiuWei	25e26a6e51	fix bugs in linear_combination_generic.h missing include cutlass/epilogue/thread/scale_type.h (#531 )	2022-06-17 23:35:14 -04:00
Haicheng Wu	f248e9bdb4	Create CITATION.cff Add initial CITATION.cff	2022-06-07 21:25:16 -04:00
Pei Sun	dceefe4f64	Increment stride correctly in warp iterator. (#516 ) Co-authored-by: peisun1115 <peis@google.com>	2022-06-06 12:33:36 -04:00
Pei Sun	c3881d097e	Fix a comment about LDSM layout. (#514 ) Co-authored-by: peisun1115 <peis@google.com>	2022-06-04 23:04:00 -04:00
Pei Sun	a29dfb1c63	Fix a bug to increment stride tile correctly (#503 ) * Fix a bug to increment stride tile correctly * Update regular_tile_access_iterator_tensor_op.h Co-authored-by: peisun1115 <peis@google.com> Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2022-06-03 22:54:52 -04:00
Jack Kosaian	0abaac84ea	[examples] Fix typos in SYRK and TRMM examples (#507 )	2022-06-03 22:52:41 -04:00
Haicheng Wu	858c735856	Update gather_scatter_fusion.cu Correct the reference code in gather/scatter example to put bias add in the correct place.	2022-05-18 13:15:25 -04:00
Haicheng Wu	d6f58b2d14	Update functionality.md	2022-05-11 09:34:24 -04:00
Mike Iovine	c4cf0dad82	Fix init-self compiler warnings (#493 ) Fix a few errors caused by trying to initialize a class member with itself. These errors can turn into errors if you compile with `-Winit-self`.	2022-05-11 00:35:28 -04:00
Haicheng Wu	57551902d0	Update functionality.md add some explanations to the functionality table.	2022-05-11 00:01:19 -04:00
Haicheng Wu	1604ebaf10	Update generator.py stop generating analytical conv kernels to reduce kernel number	2022-05-08 21:47:15 -04:00
Haicheng Wu	6023038bae	add verification of the reduction tensor (#489 ) Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-05-06 10:24:51 -07:00
TonyZhao	ddd8f9cf41	update float < int32_t * 4 (#488 ) Co-authored-by: 赵俊涛 <zhaojuntao@zhaojuntaos-MacBook-Pro.local>	2022-05-04 13:36:05 -04:00
Haicheng Wu	ec2b4fd85d	b2b bias vector support (#482 ) * b2b bias vector support * add files Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-04-30 04:16:15 -07:00
Stepan Tezyunichev	86ce09aed1	2.9 fixes for nvrtc (#480 ) * Use platform::is_same instead of std::is_same * Don't hide cuComplex include from nvrtc * Typo fixed * Remove comment rename	2022-04-29 09:06:52 -04:00
Haicheng Wu	21c1fa3849	add .github (#479 ) Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-04-28 12:36:59 -07:00
Janusz Lisiecki	8c339ac039	Fix compilation in clang (#478 ) - adds missing commas - adjusts misaligned usage of CUTLASS_DEVICE between template declaration and specializations Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>	2022-04-28 14:22:06 -04:00
Haicheng Wu	e49f690fd7	Update linear_combination_generic.h	2022-04-28 14:04:53 -04:00
Haicheng Wu	96dad61a75	Update CHANGELOG.md	2022-04-28 10:52:10 -04:00
Haicheng Wu	cc2ea4c3fc	Update README.md	2022-04-28 10:50:11 -04:00
Andrew Kerr	a0de301283	Used relative paths for includes (#477 )	2022-04-27 12:04:23 -07:00
Haicheng Wu	319a389f42	Update CMakeLists.txt (#473 ) * Update CMakeLists.txt Add 128bit int support if using nvc++ to solve #310 @jeffhammond, would you please give it a try? * Update CMakeLists.txt correct copy paste error	2022-04-27 07:02:26 -07:00
Stepan Tezyunichev	71def2f084	Use platform:: instead of std::abs and std::conditional (#452 ) * Fixed template struct/class mismatch * Use platform implementation instead of std::abs and std::conditional during nvrtc compilation * Use platform implementation instead of std::abs and std::conditional during nvrtc compilation * Revert absolute_value() usage	2022-04-25 14:40:22 -04:00
Masahiro Masuda	70f3ba57f5	Fix typo in shared memory layout description (#471 )	2022-04-24 18:32:13 -04:00
Fujun Han	dd77fadc70	Remove redundant offset def and init in shared_load_iterator.h (#456 ) Signed-off-by: Fujun Han <fujun.han@iluvatar.ai>	2022-04-24 16:31:00 -04:00
Stepan Tezyunichev	be4578d517	Fixed template struct/class mismatch (#453 )	2022-04-24 16:30:21 -04:00
Andrei Alexandrescu	d7b499deff	Fix CUDA_PERROR_EXIT and print failing expression (#446 ) `CUDA_PERROR_EXIT ` can lead to incorrect usage (see e.g. [this description](https://www.cs.technion.ac.il/users/yechiel/c++-faq/macros-with-if.html)) because it contains an incomplete `if` expression. Consider: ``` if (condition) CUDA_PERROR_EXIT(cudaFree(x)) else free(x); ``` The author of the code forgot to add a semicolon after the macro. In that case, the `else` will bind to the `if` inside the macro definition, leading to code that the author did not intend or expect. It the author does use a semicolon, the code will not compile, which is awkward. The change adds a `do while` around the `if`, which always requires a semicolon. This PR also adds the text of the failing expression to the printed error message.	2022-04-24 16:29:43 -04:00
Exusial	310ed81ac3	fix description in example 12. (#444 ) Co-authored-by: Exusial <Exusial>	2022-04-24 16:29:06 -04:00
Fujun Han	4c0d6e1eb4	[BUGFIX]: Force unroll a loop that doesn't have compilation constant (#441 ) loop times is dangerous. Signed-off-by: Peter Han <fujun.han@iluvatar.ai>	2022-04-24 16:28:32 -04:00
Jack Kosaian	167ac54c65	Fix link to Python example (#469 )	2022-04-23 15:37:38 -04:00
Andrew Kerr	12f4108ac2	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
Feng Shijie	dd571f0edb	[style] fix code indentation (#449 ) * [docs] fix typo in media/docs/layout.md * [docs] fix comment error * fix typo in include/cutlass/arch/simd_61.h * fix stride comment errors in TensorLayout * fix indentation	2022-04-03 21:13:17 -04:00
Jianyu Huang	6d0d265047	Update PUBLICATIONS.md (#447 )	2022-04-03 21:03:28 -04:00
Haicheng Wu	f11fa975a5	Update PUBLICATIONS.md @tsuki	2022-03-23 21:04:43 -04:00
Masahiro Masuda	0e71d9b450	Transposed conv2d and wgrad split k examples (#413 ) * add split k wgrad example * wgrad done * begin transposed conv2d example * update transposed conv2d example and add ref check * update doc for conv2d transpose example * add license * add wgrad doc * more clarification on GEMM output type * typo fix * clean up indent * address comments * rename example numbers to 34 and 35 * GEMM -> Implicit GEMM * Revert "rename example numbers to 34 and 35" This reverts commit 551a808c227216e9e38d4472ba8ff020557b8500. * transposed_conv2d is 34 * add compiler and device version check to exit gracefully Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-03-23 14:52:54 -04:00
Minmin Sun (孙敏敏)	eb0d4c9213	[library] pass pointer of arguments to get_host_workspace_size() in gemm_universal() (#412 ) Otherwise GemmUniversalOperation::get_host_workspace_size() will fail on SegmentFault.	2022-03-22 12:36:34 -04:00
Haojin Yang	bc45e2c023	fixed datatype error of numeric_limit for uint1b_t (#419 ) Co-authored-by: Haojin Yang <haojin.yang@.hpi.uni-potsdam.de>	2022-03-22 12:30:30 -04:00

... 2 3 4 5 6 ...

403 Commits