cutlass/include
dan_the_3rd 1b4e24470a
Example 43 - DualGemm (#670)
* Ex50 wip

* IS_PROFILING mode

* MultiStage2 - but is slower

* Add SwiGLU

* Support SplitKSerial reduction
Support not storing D0/D1
Cleanup code

* Option to disable bias

* Renumber example

* Fix build

* Remove references to pb_size_0 / pb_size_1

* Add support for bf16 inputs with float accum

* small changes

Co-authored-by: danthe3rd <danthe3rd>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-10-26 14:04:42 -04:00
..
cutlass Example 43 - DualGemm (#670) 2022-10-26 14:04:42 -04:00