* Split apart gemm reference templates into multiple TUs for parallel compilation * remove old files * better balancing of ref kernels across TUs * remove 3 new added refcheck kernels and some un-necessary fp8 library instances to reduce lib size * remove auto fp8 kernels * remove some redundant kernels |
||
|---|---|---|
| .. | ||
| library | ||
| profiler | ||
| util | ||
| CMakeLists.txt | ||