cutlass/examples/43_dual_gemm
dan_the_3rd 1b4e24470a
Example 43 - DualGemm (#670)
* Ex50 wip

* IS_PROFILING mode

* MultiStage2 - but is slower

* Add SwiGLU

* Support SplitKSerial reduction
Support not storing D0/D1
Cleanup code

* Option to disable bias

* Renumber example

* Fix build

* Remove references to pb_size_0 / pb_size_1

* Add support for bf16 inputs with float accum

* small changes

Co-authored-by: danthe3rd <danthe3rd>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-10-26 14:04:42 -04:00
..
device Example 43 - DualGemm (#670) 2022-10-26 14:04:42 -04:00
kernel Example 43 - DualGemm (#670) 2022-10-26 14:04:42 -04:00
thread Example 43 - DualGemm (#670) 2022-10-26 14:04:42 -04:00
threadblock Example 43 - DualGemm (#670) 2022-10-26 14:04:42 -04:00
CMakeLists.txt Example 43 - DualGemm (#670) 2022-10-26 14:04:42 -04:00
dual_gemm_run.h Example 43 - DualGemm (#670) 2022-10-26 14:04:42 -04:00
dual_gemm.cu Example 43 - DualGemm (#670) 2022-10-26 14:04:42 -04:00
test_run.h Example 43 - DualGemm (#670) 2022-10-26 14:04:42 -04:00