cutlass/examples/13_two_tensor_op_fusion/reference/device
Aleksandr Pivovar 4a68cf748e
added support of b2b bmm (#849)
* added support of b2b bmm

* fixed arguments and params structures

* added batch_count argument

* removed SplitKSerial and added new test case with b2b bmm

* fixed support of Kbatched and added new test case with batch stride

* added batch support for bias and scale

* make test

* small changes

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-04-14 23:20:02 -04:00
..
tensor_scale_bias.h added support of b2b bmm (#849) 2023-04-14 23:20:02 -04:00