dan_the_3rd
|
1b4e24470a
|
Example 43 - DualGemm (#670)
* Ex50 wip
* IS_PROFILING mode
* MultiStage2 - but is slower
* Add SwiGLU
* Support SplitKSerial reduction
Support not storing D0/D1
Cleanup code
* Option to disable bias
* Renumber example
* Fix build
* Remove references to pb_size_0 / pb_size_1
* Add support for bf16 inputs with float accum
* small changes
Co-authored-by: danthe3rd <danthe3rd>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
|
2022-10-26 14:04:42 -04:00 |
|