Yujia Zhai
|
cc3c29a81a
|
CUTLASS 3.6.0 (#1850)
* v3.6
* update changelog
* update readme
* fix typo
* fixing typos
* hopper gemm with weight prefetch
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
|
2024-10-09 15:33:27 -04:00 |
|
Vijay Thakkar
|
629f4653c3
|
CUTLASS 3.5.0 (#1411)
|
2024-03-19 17:51:04 -04:00 |
|
ANIKET SHIVAM
|
751eb9a885
|
Update license year (#1306)
|
2024-01-16 14:37:22 -05:00 |
|
Adnan Akhundov
|
2ba1ef10be
|
Increase max dynamic SMEM size in GemmSoftmax (#903)
|
2023-04-03 10:01:12 -04:00 |
|
ANIKET SHIVAM
|
66d9cddc83
|
New updates for 2.11 (#775)
* New updates.
* Minor profiler updates
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
|
2023-01-20 16:32:57 -05:00 |
|
Aditya Atluri
|
c975e2ccbb
|
releaase 2.11 (#703)
|
2022-11-19 09:02:15 -05:00 |
|
Yujia Zhai
|
b1d3f9b2fd
|
upstream internal updates (#616)
Co-authored-by: yuzhai <yuzhai@nvidia.com>
|
2022-09-04 23:05:09 -04:00 |
|
ANIKET SHIVAM
|
b72cbf957d
|
CUTLASS 2.10 (#615)
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
|
2022-09-03 18:48:46 -04:00 |
|
Yujia Zhai
|
04a9777b87
|
Softmax (#546)
* add test layernorm g-mem version
* Delete include/configure directory
* Delete examples/test_layernorm directory
* Update gemm_with_softmax.h
* Update gemm_softmax.cu
* Update linear_combination.h
* Update fast_math.h
* remove redundant vars
Co-authored-by: yujia.zhai <yujia.zhai@bytedance.com>
Co-authored-by: yuzhai <yuzhai@nvidia.com>
|
2022-07-02 01:19:18 -04:00 |
|
Andrew Kerr
|
12f4108ac2
|
CUTLASS 2.9 (#468)
|
2022-04-23 15:02:38 -04:00 |
|