侯奇
12626bcfe4
Update gemm_f16n_f16t_f32t_tensor_op_f32_sm80.cu with include "cutlass/gemm/device/gemm_universal.h" ( #1569 )
...
fix compile with `cmake .. -DCUTLASS_ENABLE_TESTS=ON -DCUTLASS_TEST_LEVEL=2`
2024-10-23 12:56:36 -04:00
MaxAkaAltmer
f02913c34e
Include of regular_tile_iterator.h fixed for NVRTC ( #1765 )
...
* Include of regular_tile_iterator.h fixed for NVRTC
* More include fixed for NVRTC
2024-10-23 12:55:59 -04:00
103yiran
03e3bffaec
Adjusting code indentation ( #1639 )
2024-10-23 12:55:02 -04:00
Lei Mao
e5f3caf145
Fix README ( #1658 )
...
* Fix README
* Improve README
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2024-10-23 12:52:43 -04:00
Bogumil Sapinski Mobica
83ae20c740
added mapping for bf16 to torch::kBFloat16 ( #1843 )
...
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2024-10-23 12:48:31 -04:00
Xinyu Yang
b0c09ed077
fix by adding public ( #1753 )
2024-10-23 12:45:58 -04:00
sijialou
ea69cc2849
fix typo ( #1853 )
2024-10-23 12:45:28 -04:00
Xinyu Yang
f3a3bfcbf2
add maximum support ( #1833 )
2024-10-23 12:44:56 -04:00
Sergey Klevtsov
d65266a868
Add all supported GMMA shapes ( #1890 )
2024-10-22 18:13:36 -04:00
Tri Dao
5b50a8faaf
Add GMMA shape m64n40k16 ( #1864 )
2024-10-21 20:41:47 -04:00
Sergey Klevtsov
08101d9d0c
Improve sm90 mixed dtype kernel ( #1883 )
2024-10-17 20:06:38 -04:00
Haicheng Wu
755194a7bd
add is_last_tile
2024-10-17 12:11:02 -07:00
Saagar Jha
53668799b2
Handle MNK Sm90{Row, Col}Reduction problem shapes ( #1803 )
2024-10-14 19:46:20 -04:00
Yujia Zhai
cc3c29a81a
CUTLASS 3.6.0 ( #1850 )
...
* v3.6
* update changelog
* update readme
* fix typo
* fixing typos
* hopper gemm with weight prefetch
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-10-09 15:33:27 -04:00
Feng Shijie
0837a2a00a
Fix typo in comment ( #1787 )
2024-10-07 12:39:59 -04:00
Alexander Zinoviev
477a677317
Fix typos in test/unit/conv/cache_testbed_output.h ( #1652 )
...
Co-authored-by: Alexander Zinoviev <azinoviev@tesla.com>
2024-10-07 12:39:11 -04:00
Wilber
b27c49e84a
Fix cute doc ( #1529 )
2024-10-07 12:38:32 -04:00
Junkai-Wu
e2b0789927
Add some can implement rules of hopper convolution. ( #1835 )
2024-09-25 11:28:10 -04:00
Wenlei Bao
44dae8b90e
Adjust profiler space for SM89 ( #1553 )
2024-09-19 11:40:30 -04:00
reed
2991ce18d3
Add print_svg for mma ( #1733 )
...
* add print_svg for mma
* correct the code indentation
2024-09-18 10:37:24 -04:00
Chenggang Zhao
1ebda1ccef
Fix MMA promotion interval assertions ( #1641 )
2024-09-16 12:38:42 -04:00
reed
9f68995de5
add publication: ‘EVT: Accelerating Deep Learning Training with Epilogue Visitor Tree’ ( #1526 )
...
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2024-09-16 11:55:09 -04:00
John Shumway
3a8c01a18b
Prefix a member template name with the template keyword. ( #1796 )
...
Fixes llvm buld error.
2024-09-11 13:33:56 -04:00
Junkai-Wu
dbdae514e0
Support for TMA Epilogue for Group Gemm and add pingpong ptr array & Group Gemm ( #1795 )
2024-09-11 00:07:31 -04:00
Sean Xiaowen Zhang
21d0534167
fix assertion ( #1790 )
2024-09-09 14:05:27 -04:00
Tri Dao
323c8170bf
Support ComputeFn where output type differs from input type ( #1771 )
...
This is useful for e.g. function taking in 2 float inputs and turn them to complex
2024-09-05 23:25:03 -04:00
Gabriel Wu
82f5075946
set_slice3x3 -> set_slice_3x3 ( #1784 )
2024-09-05 23:24:10 -04:00
Saagar Jha
06e337758d
Remove extraneous comma in declaration ( #1776 )
2024-09-05 17:14:15 -04:00
JiayuSun
7369adcaca
Add Sm90LinCombPerColBias ( #1774 )
...
Co-authored-by: Jiayu Sun <jiayus@s4124-0071.nvidia.com>
2024-09-04 15:11:24 -04:00
Alchan Kim
6c3044136b
Update barrier.h ( #1782 )
2024-09-04 14:52:11 -04:00
Aleksandar Samardžić
e1976daacc
Add support for mixed 4-bit/8-bit data types GEMM ( #1413 )
...
* Add support for mixed 4-bit/8-bit data types GEMM
* fix ( and )
---------
Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-08-29 23:11:06 -04:00
Shreya Gaur
f7b19de32c
minor fix for a double quote in CMakeLists.txt ( #1727 )
2024-08-19 22:21:42 -04:00
shunfan-shao
4dbf5dbed2
Use CUDA runtime API to retrieve function pointer to driver API ( #1700 )
...
* Query pfn to driver api
* use default for older toolkits
---------
Co-authored-by: shunfans <shunfans@nvidia.com>
2024-08-19 13:26:09 -04:00
Dustyn Blasig
f93a69134e
Merge pull request #1714 from NVIDIA/u128_div
...
fix uint128
2024-08-16 07:14:59 -05:00
Aleksandar Samardžić
3f084f7f3c
Add couple configs into generator.py for mixed input MM ( #1350 )
...
* Add couple configs into generator.py for mixed input MM
* change one unit test name; reenable 128x32 in the profiler
* Added U8/BF16 tests.
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2024-08-16 00:59:29 -04:00
Haicheng Wu
b0296bf682
fix uint128
2024-08-15 21:06:01 -07:00
Dustyn Blasig
865be73a97
Merge pull request #1713 from NVIDIA/351_sparse_update
...
update 3.5.1 readme/changelog
2024-08-15 11:44:49 -05:00
Haicheng Wu
8d8cfdf375
update 3.5.1 readme/changelog
2024-08-14 21:12:44 -07:00
eqy
fb170439e8
Update half.h ( #1709 )
2024-08-14 14:59:59 -04:00
dePaul Miller
4e5a8f6853
3.5.1 plots and updated readme ( #1708 )
...
Co-authored-by: dePaul Miller <23461061+depaulmillz@users.noreply.github.com>
2024-08-12 18:55:55 -04:00
Tri Dao
7192f4ab23
Add CLayout_64x208 ( #1680 )
...
Without this I get compilation error when the extended shapes are enabled
2024-08-08 14:00:24 -04:00
dePaul Miller
2049c6c5a2
5476 cutlass 3x gemm kernels ( #1695 )
...
Co-authored-by: dePaul Miller <23461061+depaulmillz@users.noreply.github.com>
2024-08-08 13:56:23 -04:00
chenwei
e22ba590cd
support data type w2 used in cutlass_library ( #1517 )
2024-08-06 11:15:18 -04:00
Mark Hoemmen
19b4c5e065
Fix isnan namespace qualification in cutlass/functional.h ( #1679 )
...
* Fix unrelated MSVC build warnings
* Fix use of isnan in functional.h
Correct namespace qualification of isnan in functional.h
so that it invokes cutlass::isnan for half_t, instead of
converting half_t to float and invoking std::isnan (on host,
or ::isnan on device).
2024-08-05 14:28:13 -04:00
dePaul Miller
06b21349bc
1x1x1 cluster launch ( #1673 )
2024-08-01 12:20:28 -04:00
Ali Hassani
eee0cab26c
Stamp out 1x1x1 clusters, 128x256 CTA shape ( #1665 )
...
Adds 128x256 tile shapes to FP16/BF16 and FP8 generators.
Also adds 1x1x1 clusters to all existing FP16/BF16/FP8 generators.
NOTE: it is important to set kernel filter (--kernels /
CUTLASS_LIBRARY_KERNELS) to a non empty string and skip pruning to get
all of the new configurations.
If profiling exhaustively, they can be set to `*`.
Number of CUTLASS 3.X GEMMs before this commit: 2868
Number of CUTLASS 3.X GEMMs after this commit: 4016
Co-authored-by: Ali Hassani <ahassani@nvidia.com>
2024-07-31 20:22:29 -04:00
Sergey Klevtsov
36cbfcf483
Add extended wgmma shapes for all data types ( #1666 )
2024-07-31 18:33:14 -04:00
Ali Hassani
1f2b590da6
Skip void-C kernels in the profiler when beta is non zero ( #1661 )
...
* Skip void-C kernels in the profiler when beta is non zero
CUTLASS profiler will only skip disposition for void-C kernels when beta
is non zero, when it makes more sense to skip running it in the first
place.
Not all users are aware of void-C kernels (as far as I know it wasn't a
thing in 2.X), and not everyone remembers to filter out voidC kernels
when running the profiler with a non zero beta.
The easiest solution (and as far as I can tell correct way of handling this)
is that `can_implement` return `false` when beta is non zero (or
whatever argument indicates an epilogue source) but we have a void-C
kernel.
Profiler already includes functionality to skip running kernels that
fail `can_implement`.
* Move checks to collectives instead
---------
Co-authored-by: Ali Hassani <ahassani@nvidia.com>
2024-07-31 18:11:58 -04:00
dePaul Miller
8b2a0408bd
Profiler docs and argument update for raster order ( #1667 )
2024-07-31 16:40:10 -04:00
eqy
fbd116c0e5
fix build on SM 5.2 ( #1664 )
2024-07-31 09:54:57 -04:00