* Allow per-column bias in EpilogueTensorBroadcast
EpilogueTensorBroadcast only supports per-row vector broadcast, because
the bias stride is hardcoded.
It can easily support both if the bias stride is made conditional, and
the original behavior is maintained by defaulting to per-row.
* Add unit test for EpilogueTensorBroadcast with per-col bias
---------
Co-authored-by: Ali Hassani <ahassanijr@gmail.com>
Co-authored-by: Ali Hassani <ali@hippoml.com>