![]() * Ex50 wip * IS_PROFILING mode * MultiStage2 - but is slower * Add SwiGLU * Support SplitKSerial reduction Support not storing D0/D1 Cleanup code * Option to disable bias * Renumber example * Fix build * Remove references to pb_size_0 / pb_size_1 * Add support for bf16 inputs with float accum * small changes Co-authored-by: danthe3rd <danthe3rd> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> |
||
---|---|---|
.. | ||
cutlass |