ANIKET SHIVAM
751eb9a885
Update license year ( #1306 )
2024-01-16 14:37:22 -05:00
tpoisonooo
a77b2c9cb8
style(examples): typo ( #1080 )
...
* Update ampere_tensorop_conv2dfprop.cu
learning cutlass, PR a typo.
* Update ampere_gemm_operand_reduction_fusion.cu
2023-09-11 10:13:22 -04:00
ANIKET SHIVAM
d572cc1aab
CUTLASS 3.1 ( #915 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-04-14 23:19:34 -04:00
Alexander Pivovarov
7e370c9637
Fix typos 2 ( #842 )
...
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2023-03-09 23:22:56 -05:00
ANIKET SHIVAM
66d9cddc83
New updates for 2.11 ( #775 )
...
* New updates.
* Minor profiler updates
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-01-20 16:32:57 -05:00
ANIKET SHIVAM
b72cbf957d
CUTLASS 2.10 ( #615 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-09-03 18:48:46 -04:00
Ivan Komarov
0b8cacd6f1
Remove redundant <fstream> includes ( #563 )
...
* Remove redundant <fstream> includes
* Fix fstream in examples/
* Fix <fstream> in test/
* Use consistent order for <fstream> (always after <iostream>)
* Remove an unneeded include in a file where std::ofstream usage is commented out
Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru>
2022-07-19 15:23:54 -04:00
Andrew Kerr
12f4108ac2
CUTLASS 2.9 ( #468 )
2022-04-23 15:02:38 -04:00
Bing Xu
d0d941efc7
[hardswish] correct implmentation ( #403 )
...
* [hardswish] correct implmentation
* seems working
* hardswish fp32/fp16x2 optimization
* [relu] half2 support
* add relu0; add multiply_add_relu0;
* cleanup
Co-authored-by: Bing Xu <bingxu@fb.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-02-09 14:28:53 -05:00
Manish Gupta
1ac4559d12
Cutlass 2.6 Update 1 ( #301 )
...
* cutlass 2.6 update
* remove debug prints
2021-07-27 17:58:30 -07:00
Manish Gupta
e5d51840e8
CUTLASS 2.6 ( #298 )
...
CUTLASS 2.6
2021-07-23 00:40:53 -04:00
Andrew Kerr
0e13748649
CUTLASS 2.5
2021-02-26 09:58:26 -05:00