* add two missing files
* fix bunch of bugs of gemm-reducek fusion and add a device interface
* small changes
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
* Remove redundant <fstream> includes
* Fix fstream in examples/
* Fix <fstream> in test/
* Use consistent order for <fstream> (always after <iostream>)
* Remove an unneeded include in a file where std::ofstream usage is commented out
Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru>
When split-k is enabled, we should set alpha to 1 and beta to 0 for the
split-k gemm kernel.
The fix was from hwu36. I only did fixed some minor typos along with his
fix.