Update README.md

2022-04-28 10:50:11 -04:00 · 2022-04-28 10:50:11 -04:00 · cc2ea4c3fc
commit cc2ea4c3fc
parent a0de301283
1 changed files with 8 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -45,7 +45,14 @@ CUTLASS 2.9 is an update to CUTLASS adding:
  - [SYMM](/test/unit/gemm/device/symm_f32n_f32n_tensor_op_fast_f32_ls_sm80.cu), [HEMM](/test/unit/gemm/device/hemm_cf32h_cf32n_tensor_op_fast_f32_ls_sm80.cu)
 - [CUTLASS Python](/examples/40_cutlass_py) demonstrating JIT compilation of CUTLASS kernels and a Python-based runtime using [CUDA Python](https://developer.nvidia.com/cuda-python)
 - [GEMM + Softmax example](/examples/35_gemm_softmax)
+- [Gather and Scatter Fusion with GEMM](/examples/36_gather_scatter_fusion) can gather inputs and scatters outputs based on indices vectors in the same GEMM kernel.
+- [Back-to-back GEMM/CONV](examples/13_two_tensor_op_fusion) fully supports buffering the previous GEMM/CONV results in the shared memory for the latter one to use.
+- [Transposed Convolution](/examples/34_transposed_conv2d) (a.k.a Deconvolution) support which reuses Dgrad implementation.
+- [Utility functions](/tools/util/include/cutlass/util) that can pad NHWC and convert between NCHW and NHWC.
+- [Small alignment implicit gemm](https://github.com/NVIDIA/cutlass/issues/242) support for Fprop/Dgrad/Wgrad so that padding is no longer mandated to use tensor cores.
+- Epilogue enhancement with performance improvement, more activation functions, and more fusion patterns.
 - Optimal performance using [CUDA 11.6u2](https://developer.nvidia.com/cuda-downloads)
+- [Parallel GEMM splitk](https://github.com/NVIDIA/cutlass/pull/277) support in the CUTLASS profiler.
 - Updates and bugfixes from the community (thanks!)
 - **Deprecation announcement:** CUTLASS plans to deprecate the following:
  - Maxwell and Pascal GPU architectures