![]() * [hardswish] correct implmentation * seems working * hardswish fp32/fp16x2 optimization * [relu] half2 support * add relu0; add multiply_add_relu0; * cleanup Co-authored-by: Bing Xu <bingxu@fb.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> |
||
---|---|---|
.. | ||
ampere_tensorop_conv2dfprop.cu | ||
CMakeLists.txt |