* Add couple configs into generator.py for mixed input MM
* change one unit test name; reenable 128x32 in the profiler
* Added U8/BF16 tests.
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>