cutlass/examples/16_ampere_tensorop_conv2dfprop
Bing Xu d0d941efc7
[hardswish] correct implmentation (#403)
* [hardswish] correct implmentation

* seems working

* hardswish fp32/fp16x2 optimization

* [relu] half2 support

* add relu0; add multiply_add_relu0;

* cleanup

Co-authored-by: Bing Xu <bingxu@fb.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-02-09 14:28:53 -05:00
..
ampere_tensorop_conv2dfprop.cu [hardswish] correct implmentation (#403) 2022-02-09 14:28:53 -05:00
CMakeLists.txt Cutlass 2.6 Update 1 (#301) 2021-07-27 17:58:30 -07:00