Commit Graph

251 Commits

Author SHA1 Message Date
Xinyu Yang
b0c09ed077
fix by adding public (#1753) 2024-10-23 12:45:58 -04:00
Sergey Klevtsov
d65266a868
Add all supported GMMA shapes (#1890) 2024-10-22 18:13:36 -04:00
Tri Dao
5b50a8faaf
Add GMMA shape m64n40k16 (#1864) 2024-10-21 20:41:47 -04:00
Sergey Klevtsov
08101d9d0c
Improve sm90 mixed dtype kernel (#1883) 2024-10-17 20:06:38 -04:00
Haicheng Wu
755194a7bd add is_last_tile 2024-10-17 12:11:02 -07:00
Saagar Jha
53668799b2
Handle MNK Sm90{Row, Col}Reduction problem shapes (#1803) 2024-10-14 19:46:20 -04:00
Yujia Zhai
cc3c29a81a
CUTLASS 3.6.0 (#1850)
* v3.6

* update changelog

* update readme

* fix typo

* fixing typos

* hopper gemm with weight prefetch

---------

Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-10-09 15:33:27 -04:00
Feng Shijie
0837a2a00a
Fix typo in comment (#1787) 2024-10-07 12:39:59 -04:00
Junkai-Wu
e2b0789927
Add some can implement rules of hopper convolution. (#1835) 2024-09-25 11:28:10 -04:00
reed
2991ce18d3
Add print_svg for mma (#1733)
* add print_svg for mma

* correct the code indentation
2024-09-18 10:37:24 -04:00
Chenggang Zhao
1ebda1ccef
Fix MMA promotion interval assertions (#1641) 2024-09-16 12:38:42 -04:00
John Shumway
3a8c01a18b
Prefix a member template name with the template keyword. (#1796)
Fixes llvm buld error.
2024-09-11 13:33:56 -04:00
Junkai-Wu
dbdae514e0
Support for TMA Epilogue for Group Gemm and add pingpong ptr array & Group Gemm (#1795) 2024-09-11 00:07:31 -04:00
Sean Xiaowen Zhang
21d0534167
fix assertion (#1790) 2024-09-09 14:05:27 -04:00
Tri Dao
323c8170bf
Support ComputeFn where output type differs from input type (#1771)
This is useful for e.g. function taking in 2 float inputs and turn them to complex
2024-09-05 23:25:03 -04:00
Gabriel Wu
82f5075946
set_slice3x3 -> set_slice_3x3 (#1784) 2024-09-05 23:24:10 -04:00
Saagar Jha
06e337758d
Remove extraneous comma in declaration (#1776) 2024-09-05 17:14:15 -04:00
JiayuSun
7369adcaca
Add Sm90LinCombPerColBias (#1774)
Co-authored-by: Jiayu Sun <jiayus@s4124-0071.nvidia.com>
2024-09-04 15:11:24 -04:00
Alchan Kim
6c3044136b
Update barrier.h (#1782) 2024-09-04 14:52:11 -04:00
Aleksandar Samardžić
e1976daacc
Add support for mixed 4-bit/8-bit data types GEMM (#1413)
* Add support for mixed 4-bit/8-bit data types GEMM

* fix ( and )

---------

Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-08-29 23:11:06 -04:00
shunfan-shao
4dbf5dbed2
Use CUDA runtime API to retrieve function pointer to driver API (#1700)
* Query pfn to driver api

* use default for older toolkits

---------

Co-authored-by: shunfans <shunfans@nvidia.com>
2024-08-19 13:26:09 -04:00
Haicheng Wu
b0296bf682 fix uint128 2024-08-15 21:06:01 -07:00
eqy
fb170439e8
Update half.h (#1709) 2024-08-14 14:59:59 -04:00
Tri Dao
7192f4ab23
Add CLayout_64x208 (#1680)
Without this I get compilation error when the extended shapes are enabled
2024-08-08 14:00:24 -04:00
Mark Hoemmen
19b4c5e065
Fix isnan namespace qualification in cutlass/functional.h (#1679)
* Fix unrelated MSVC build warnings

* Fix use of isnan in functional.h

Correct namespace qualification of isnan in functional.h
so that it invokes cutlass::isnan for half_t, instead of
converting half_t to float and invoking std::isnan (on host,
or ::isnan on device).
2024-08-05 14:28:13 -04:00
dePaul Miller
06b21349bc
1x1x1 cluster launch (#1673) 2024-08-01 12:20:28 -04:00
Sergey Klevtsov
36cbfcf483
Add extended wgmma shapes for all data types (#1666) 2024-07-31 18:33:14 -04:00
Ali Hassani
1f2b590da6
Skip void-C kernels in the profiler when beta is non zero (#1661)
* Skip void-C kernels in the profiler when beta is non zero

CUTLASS profiler will only skip disposition for void-C kernels when beta
is non zero, when it makes more sense to skip running it in the first
place.

Not all users are aware of void-C kernels (as far as I know it wasn't a
thing in 2.X), and not everyone remembers to filter out voidC kernels
when running the profiler with a non zero beta.

The easiest solution (and as far as I can tell correct way of handling this)
is that `can_implement` return `false` when beta is non zero (or
whatever argument indicates an epilogue source) but we have a void-C
kernel.

Profiler already includes functionality to skip running kernels that
fail `can_implement`.

* Move checks to collectives instead

---------

Co-authored-by: Ali Hassani <ahassani@nvidia.com>
2024-07-31 18:11:58 -04:00
eqy
fbd116c0e5
fix build on SM 5.2 (#1664) 2024-07-31 09:54:57 -04:00
Tri Dao
5b283c872c
Add more GMMA shapes (#1630)
* Add more GMMA shapes

* Add more shapes for BF16
2024-07-29 19:09:51 -04:00
Vijay Thakkar
be60a0b272
CUTLASS 3.5.1 (#1623)
* CUTLASS 3.5.1

* updates, optimizations, fixes
2024-07-29 08:46:24 -04:00
Chengquan Jiang
56b46e2d13
Fix grouped gemm invalid memory access to problem shapes (#1543) 2024-07-10 11:55:22 -04:00
Kevin Tong
52fb43f30f
fix mbarrier invalidate (#1494) 2024-07-10 11:35:26 -04:00
Andy Lo
81b06ee0e0
Fix B operand variable name and comments (#1458) 2024-07-10 11:06:29 -04:00
Nick John Eliopoulos
637b159063
Fix C++17 version detection in helper_macros.hpp (#1479)
* It seems that __cplusplus can be inconsistent with _MSVC_LANG when discerning C++17 version. See https://github.com/NVIDIA/cutlass/issues/1474. Added switch to check _MSVC_LANG in addition to __cplusplus

* Fixed typo.

* Oops, another typo.

* Changed incorrect logic, ifndef to ifdef

* Define CUTLAS_CPLUSPLUS for language version testing

Co-authored-by: Mark Hoemmen <mhoemmen@users.noreply.github.com>

---------

Co-authored-by: Mark Hoemmen <mhoemmen@users.noreply.github.com>
2024-05-28 11:00:51 -04:00
Vijay Thakkar
7d49e6c7e2
Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
lzw
8e7d9f483d
add missing header for size_t in numeric_types.h (#1420)
* add missing header for size_t in `numeric_types.h`

* make nvrtc happy

* add missing header for int types in `cutlass/arch/memory.h`

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-04-09 14:15:48 -04:00
reed
19f3cc33f1
Fix uint128 operator add (#1400)
* fix uint128 operator add for 64-bit hilo implemenation

* add uint128 test for operator add

* make clang happy

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-04-02 13:32:18 -04:00
reed
28cbacbf64
fix stride compilation warning (#1415) 2024-03-29 23:50:33 -04:00
seventh
c4e3e122e2
group gemm set stride L = cute::Int<0> (#1416) 2024-03-20 17:31:14 -04:00
Vijay Thakkar
629f4653c3
CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
LiYu Lu
a8f2c80db0
fix tile_size(TiledCopy<Args...> const&) error (#1357) 2024-02-24 00:33:01 -05:00
ANIKET SHIVAM
bbe579a9e3
Updates for CUTLASS 3.4.1 (#1346)
* Updates for CUTLASS 3.4.1

* minor epi change
2024-02-15 15:48:34 -05:00
Driss Guessous
47a3ebbea9
Add a missing platform include (#1328) 2024-02-03 01:30:32 -05:00
reed
8825fbf1ef
fix unrecognized print format specifier for int8/uint8 (#1303)
* fix unrecognized print format specifier for int8/uint8

* use c++ static_cast instead of c cast style
2024-01-29 21:22:40 -05:00
reed
092f14db05
fix tile_size_mnk compilation warning (#1294) 2024-01-29 21:21:15 -05:00
Aleksandar Samardžić
ca37d632c9
Remove sparse GEMM with row broadcasted bias vector (#1302)
This reverts commit d3e72719b4.

Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs>
2024-01-17 14:06:27 -05:00
Chengquan Jiang
362abbf274
Support ElementD to be void for tma (#1153)
* Support void D with AuxStore

* refine get_element_aux
2024-01-16 18:15:42 -05:00
ANIKET SHIVAM
751eb9a885
Update license year (#1306) 2024-01-16 14:37:22 -05:00
ANIKET SHIVAM
2f589ffa76
Updates for 3.4 release. (#1305) 2024-01-16 13:42:51 -05:00