Commit Graph

514 Commits

Author SHA1 Message Date
Ali Hassani
1f2b590da6
Skip void-C kernels in the profiler when beta is non zero (#1661)
* Skip void-C kernels in the profiler when beta is non zero

CUTLASS profiler will only skip disposition for void-C kernels when beta
is non zero, when it makes more sense to skip running it in the first
place.

Not all users are aware of void-C kernels (as far as I know it wasn't a
thing in 2.X), and not everyone remembers to filter out voidC kernels
when running the profiler with a non zero beta.

The easiest solution (and as far as I can tell correct way of handling this)
is that `can_implement` return `false` when beta is non zero (or
whatever argument indicates an epilogue source) but we have a void-C
kernel.

Profiler already includes functionality to skip running kernels that
fail `can_implement`.

* Move checks to collectives instead

---------

Co-authored-by: Ali Hassani <ahassani@nvidia.com>
2024-07-31 18:11:58 -04:00
dePaul Miller
8b2a0408bd
Profiler docs and argument update for raster order (#1667) 2024-07-31 16:40:10 -04:00
eqy
fbd116c0e5
fix build on SM 5.2 (#1664) 2024-07-31 09:54:57 -04:00
Tri Dao
5b283c872c
Add more GMMA shapes (#1630)
* Add more GMMA shapes

* Add more shapes for BF16
2024-07-29 19:09:51 -04:00
Vijay Thakkar
be60a0b272
CUTLASS 3.5.1 (#1623)
* CUTLASS 3.5.1

* updates, optimizations, fixes
2024-07-29 08:46:24 -04:00
Chengquan Jiang
56b46e2d13
Fix grouped gemm invalid memory access to problem shapes (#1543) 2024-07-10 11:55:22 -04:00
Kevin Tong
52fb43f30f
fix mbarrier invalidate (#1494) 2024-07-10 11:35:26 -04:00
Joe Rowell
843adf0408
Fix SMEM index for C in CuTe examples (#1477) 2024-07-10 11:14:15 -04:00
LiYu Lu
e48c7618e4
[bug] fix device thread gemm.h constructor (#1473) 2024-07-10 11:12:36 -04:00
Ali Hassani
c5239d8312
Add Faster Neighborhood Attention to pubs (#1471) 2024-07-10 11:09:13 -04:00
Daniel Richard G
d6580c3dc0
Support use of external/system GTest installation (#1469)
* Support use of system/external GTest installation

* Create working directory for tests explicitly
2024-07-10 11:07:57 -04:00
Andy Lo
81b06ee0e0
Fix B operand variable name and comments (#1458) 2024-07-10 11:06:29 -04:00
Alexander Zinoviev
dbfced05e7
Fix typos in convolution tests (#1433) 2024-07-10 11:00:52 -04:00
Raul
2448bb56e6
Update gemm_api_3x.md (#1386)
Fixed what it seems to be an obvious typo.
2024-07-10 10:59:02 -04:00
Nick John Eliopoulos
637b159063
Fix C++17 version detection in helper_macros.hpp (#1479)
* It seems that __cplusplus can be inconsistent with _MSVC_LANG when discerning C++17 version. See https://github.com/NVIDIA/cutlass/issues/1474. Added switch to check _MSVC_LANG in addition to __cplusplus

* Fixed typo.

* Oops, another typo.

* Changed incorrect logic, ifndef to ifdef

* Define CUTLAS_CPLUSPLUS for language version testing

Co-authored-by: Mark Hoemmen <mhoemmen@users.noreply.github.com>

---------

Co-authored-by: Mark Hoemmen <mhoemmen@users.noreply.github.com>
2024-05-28 11:00:51 -04:00
Manish Gupta
033d9efd2d
[Documentation] Fixes the confusion between concatenated vs. composed layout in CuTe documentation (#1498)
* Update 02_layout_algebra.md

* Update 02_layout_algebra.md
2024-05-02 15:35:12 -04:00
Sin
acc3ee18a1
Fix typos in cute docs (#1486)
* fix typos in 02_layout_algebra.md

* fix typos in 03_tensor.md
2024-05-02 15:34:36 -04:00
djns99
5c447dd84f
Update packed_stride.hpp to add CUTLASS_HOST_DEVICE decorator to new functions (#1495) 2024-04-19 12:07:57 -04:00
Vijay Thakkar
7d49e6c7e2
Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
Mehdi Yazdani
a40e08e9d5
Update 02_layout_algebra.md (#1451)
change line 348 to reflect correct layout.
2024-04-10 10:57:57 -04:00
lzw
8e7d9f483d
add missing header for size_t in numeric_types.h (#1420)
* add missing header for size_t in `numeric_types.h`

* make nvrtc happy

* add missing header for int types in `cutlass/arch/memory.h`

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-04-09 14:15:48 -04:00
reed
19f3cc33f1
Fix uint128 operator add (#1400)
* fix uint128 operator add for 64-bit hilo implemenation

* add uint128 test for operator add

* make clang happy

---------

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-04-02 13:32:18 -04:00
jeromeku
f9ece1b42c
Python Gemm tile_descriptions fix (#1439)
* fix python gemm tile descriptions

* fix formatting

* fix math_operation filtering

* fix formatting
2024-03-30 09:00:46 -04:00
reed
28cbacbf64
fix stride compilation warning (#1415) 2024-03-29 23:50:33 -04:00
Tom Tan
8f7d2789b8
[NFC] improve doc: fix typo in mma doc (#1417) 2024-03-27 14:07:20 -04:00
seventh
c4e3e122e2
group gemm set stride L = cute::Int<0> (#1416) 2024-03-20 17:31:14 -04:00
Vijay Thakkar
629f4653c3
CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
lorenzo chelini
ffa34e7075
(NFC) improve doc: Add missing verb to sentence (#1377)
Co-authored-by: lorenzo chelini <lchelini@nvidia.com>
2024-03-04 15:30:10 -05:00
LiYu Lu
a8f2c80db0
fix tile_size(TiledCopy<Args...> const&) error (#1357) 2024-02-24 00:33:01 -05:00
ANIKET SHIVAM
bbe579a9e3
Updates for CUTLASS 3.4.1 (#1346)
* Updates for CUTLASS 3.4.1

* minor epi change
2024-02-15 15:48:34 -05:00
Driss Guessous
47a3ebbea9
Add a missing platform include (#1328) 2024-02-03 01:30:32 -05:00
Chenggang Zhao
57e01e1a6b
Fix missing include file (#1318) 2024-02-03 01:29:32 -05:00
xws117
6e3df975a2
Modify comments in code examples/08_turing_tensorop_gemm/turing_tensorop_gemm.cu (#1325) 2024-01-31 21:41:30 -05:00
reed
8825fbf1ef
fix unrecognized print format specifier for int8/uint8 (#1303)
* fix unrecognized print format specifier for int8/uint8

* use c++ static_cast instead of c cast style
2024-01-29 21:22:40 -05:00
reed
092f14db05
fix tile_size_mnk compilation warning (#1294) 2024-01-29 21:21:15 -05:00
Haicheng Wu
9385141f19
Update PUBLICATIONS.md
ptq paper from goog
2024-01-19 14:17:55 -05:00
Haicheng Wu
b4b5b11070
Update PUBLICATIONS.md
add odyssey llm paper from metuan
2024-01-18 10:30:21 -05:00
jayhshah
139b93db61
update publications (#1308) 2024-01-17 14:06:46 -05:00
Aleksandar Samardžić
ca37d632c9
Remove sparse GEMM with row broadcasted bias vector (#1302)
This reverts commit d3e72719b4.

Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs>
2024-01-17 14:06:27 -05:00
Chengquan Jiang
362abbf274
Support ElementD to be void for tma (#1153)
* Support void D with AuxStore

* refine get_element_aux
2024-01-16 18:15:42 -05:00
ANIKET SHIVAM
751eb9a885
Update license year (#1306) 2024-01-16 14:37:22 -05:00
ANIKET SHIVAM
2f589ffa76
Updates for 3.4 release. (#1305) 2024-01-16 13:42:51 -05:00
Tianao Ge
acba5beee5
Fix flops calculation and tensor b stride calculation in the example 36 (#1278)
* Fix flops calculation and tensor b stride calculation in the example 36

* Fix datatype

* Update gather_scatter_fusion.cu
2024-01-08 17:27:30 -05:00
Eugene Zhulenev
74d1f3e63a
Fix cute::array<T, 0> iterator (#1273) 2024-01-08 17:10:09 -05:00
Kun Wu
8ac2edc810
expose stream API in python kernel call interfaces (#1287)
* expose stream API in python kernel call interfaces

* add stream to ReductionArguments; document stream arg

* add stream argument to GemmGroupedArguments
2024-01-05 08:27:45 -05:00
Ali Hassani
d4be5ab5d7
Allow per-column bias in EpilogueTensorBroadcast (#1275)
* Allow per-column bias in EpilogueTensorBroadcast

EpilogueTensorBroadcast only supports per-row vector broadcast, because
the bias stride is hardcoded.

It can easily support both if the bias stride is made conditional, and
the original behavior is maintained by defaulting to per-row.

* Add unit test for EpilogueTensorBroadcast with per-col bias

---------

Co-authored-by: Ali Hassani <ahassanijr@gmail.com>
Co-authored-by: Ali Hassani <ali@hippoml.com>
2024-01-04 12:48:31 -05:00
Jee Li
c9591a694d
fix typo (#1279) 2024-01-04 12:41:39 -05:00
Aleksandar Samardžić
5c756eb774
Add support for sparse GEMM with visitor epilogue (#1189)
* Add support for sparse GEMM with visitor epilogue

* Refactor changes at the kernel level
2024-01-04 12:38:11 -05:00
Pradeep Ramani
8236f30675
CUTLASS 3.4.0 (#1286)
* CUTLASS 3.4.0

* Update CHANGELOG.md

---------

Co-authored-by: Pradeep Ramani <prramani@nvidia.com>
2023-12-29 15:21:31 -05:00
Christian Sigg
b7508e3379
Fix inline ptx escaping for predicates. (#1264)
* Fix inline ptx escaping for predicates.

Prevents `error: invalid % escape in inline assembly string` when compiling with clang.

* More double-quoting.
2023-12-14 11:16:15 -05:00