Alexander Zinoviev
42290f5d1c
Fix for dangling pointers ( #885 )
2023-03-25 01:15:14 -04:00
Jack Kosaian
6116706c96
Set batch_strides on Params::update ( #883 )
2023-03-20 17:07:47 -04:00
Nikita Shulga
2670b973dd
Fix sign-compare warning in reorder_array
( #869 )
...
`std::vector<T>::size_type` is unsigned type, so let's iterate over unsigned type as well
Discovered, while trying to enable PyTorch building without `-Wno-sign-compare` warning suppression, see https://github.com/pytorch/pytorch/actions/runs/4418987999/jobs/7746850762#step:10:10532
2023-03-20 17:07:24 -04:00
Stepan Tezyunichev
29801e348a
Hide streams and typinfo from nvrtc ( #853 )
...
* Hide streams and typinfo from nvrtc
* Use __CUDACC_RTC__ instead CUDA_ARCH for guard
2023-03-09 23:24:47 -05:00
Alexander Pivovarov
7e370c9637
Fix typos 2 ( #842 )
...
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2023-03-09 23:22:56 -05:00
dan_the_3rd
f396cdd15c
ex24[gemm_grouped]: Allow to change layout/dtype ( #841 )
...
* ex24[gemm_grouped]: Allow to change layout/dtype
* Address suggestion from @jackkosaian
---------
Co-authored-by: danthe3rd <danthe3rd>
2023-03-01 07:13:51 -05:00
Haicheng Wu
65688c2a87
streamk fix ( #836 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-02-23 16:35:08 -05:00
Yuxin Wu
95f673ecf7
Update base_grouped.h ( #832 )
2023-02-21 14:48:30 -05:00
Haicheng Wu
91b8de8d32
streamk fix ( #830 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-02-20 11:03:16 -05:00
Shuai Shao
ce8597dc14
Fix type bug in conv2d/gemm with broadcast ( #796 )
...
add ElementVector
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-02-09 20:53:25 -05:00
Vijay Thakkar
277bd6e537
CUTLASS 3.0.0 ( #786 )
...
* CUTLASS 3.0.0
2023-01-23 20:55:28 -05:00
ANIKET SHIVAM
66d9cddc83
New updates for 2.11 ( #775 )
...
* New updates.
* Minor profiler updates
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-01-20 16:32:57 -05:00
Haicheng Wu
764b840d6f
streamk example and performance tuning ( #760 )
...
* streamk example and performance tuning
* one missing file
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-01-10 16:10:02 -05:00
Haicheng Wu
ff6e733fe1
restore the old epilogue for everything except streamk ( #749 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-01-04 11:02:55 -05:00
Haicheng Wu
1e64f153b3
improve streamk load balance ( #743 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-12-25 13:56:33 -05:00
ANIKET SHIVAM
38193d76e3
Updates for stream-k ( #728 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-12-08 23:48:10 -05:00
Mike Iovine
d6117ca362
Relax stream K gemm alignment constraints ( #717 )
...
* Relax stream K gemm alignment constraints
The current alignment requirements are too strict. Make them identical
to the checks for the regular universal gemm.
* Revert "Relax stream K gemm alignment constraints"
This reverts commit 31e80a250e2b0ac4bda2e4b437b39dc5bcd5e845.
* Relax stream K gemm alignment constraints
The current alignment requirements are too strict. Make them identical
to the checks for the regular universal gemm.
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-12-07 11:17:49 -05:00
Aditya Atluri
c975e2ccbb
releaase 2.11 ( #703 )
2022-11-19 09:02:15 -05:00
Haicheng Wu
012c62c748
bug fixes and enharcement to gemm reductionK fusion ( #682 )
...
* add two missing files
* fix bunch of bugs of gemm-reducek fusion and add a device interface
* small changes
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-11-03 11:07:50 -04:00
dan_the_3rd
1b4e24470a
Example 43 - DualGemm ( #670 )
...
* Ex50 wip
* IS_PROFILING mode
* MultiStage2 - but is slower
* Add SwiGLU
* Support SplitKSerial reduction
Support not storing D0/D1
Cleanup code
* Option to disable bias
* Renumber example
* Fix build
* Remove references to pb_size_0 / pb_size_1
* Add support for bf16 inputs with float accum
* small changes
Co-authored-by: danthe3rd <danthe3rd>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-10-26 14:04:42 -04:00
Ying Zhang
dadc881a96
Bug fix for gemm broadcast ( #650 )
...
* gemm_universal_with_broadcast, +2 sources.
* Revert "gemm_universal_with_broadcast, +2 sources."
This reverts commit fb063251f2144a091f12c9abfce7e1713f2d1c9e.
* gemm broadcast bug fix
2022-09-30 10:00:38 -04:00
Ying Zhang
a821280dc7
Gemm broadcast ( #632 )
...
* gemm_universal_with_broadcast, +2 sources.
* Revert "gemm_universal_with_broadcast, +2 sources."
This reverts commit fb063251f2144a091f12c9abfce7e1713f2d1c9e.
* gemm_universal_with_broadcast separated version.
* Update copyright banner.
* update banner
2022-09-20 10:37:12 -04:00
ANIKET SHIVAM
e773429f7e
CUTLASS 2.10 updates ( #622 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-09-12 21:26:30 -04:00
Jack Kosaian
f29d8f7ca9
Include vector in base_grouped.h ( #618 )
2022-09-06 13:21:23 -04:00
ANIKET SHIVAM
b72cbf957d
CUTLASS 2.10 ( #615 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-09-03 18:48:46 -04:00
Cliff Burdick
ca23ff7924
Fixed typo in class name ( #608 )
2022-08-29 20:51:52 -04:00
Cliff Burdick
abafbf2afd
Missing comma in trmm header ( #604 )
2022-08-25 16:07:33 -04:00
Haicheng Wu
497b499d9d
Add residual support for shmem staging iterator used in back-to-back GEMM fusion. This allows support of problem_size_0_n that is not multiple of 32. ( #590 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-08-15 11:19:24 -04:00
Jack Kosaian
fa56763c25
Fix occupancy calculation for grouped GEMM ( #532 )
2022-06-18 19:53:59 -04:00
Pei Sun
dceefe4f64
Increment stride correctly in warp iterator. ( #516 )
...
Co-authored-by: peisun1115 <peis@google.com>
2022-06-06 12:33:36 -04:00
Pei Sun
c3881d097e
Fix a comment about LDSM layout. ( #514 )
...
Co-authored-by: peisun1115 <peis@google.com>
2022-06-04 23:04:00 -04:00
Mike Iovine
c4cf0dad82
Fix init-self compiler warnings ( #493 )
...
Fix a few errors caused by trying to initialize a class member
with itself. These errors can turn into errors if you compile
with `-Winit-self`.
2022-05-11 00:35:28 -04:00
Stepan Tezyunichev
86ce09aed1
2.9 fixes for nvrtc ( #480 )
...
* Use platform::is_same instead of std::is_same
* Don't hide cuComplex include from nvrtc
* Typo fixed
* Remove comment rename
2022-04-29 09:06:52 -04:00
Andrew Kerr
12f4108ac2
CUTLASS 2.9 ( #468 )
2022-04-23 15:02:38 -04:00
Feng Shijie
dd571f0edb
[style] fix code indentation ( #449 )
...
* [docs] fix typo in media/docs/layout.md
* [docs] fix comment error
* fix typo in include/cutlass/arch/simd_61.h
* fix stride comment errors in TensorLayout
* fix indentation
2022-04-03 21:13:17 -04:00
HouQiming
96a11a1ef3
Removed trivial copy constructors on parameter classes to enable devi… ( #366 )
...
* Removed trivial copy constructors on parameter classes to enable device-side launch of CUTLASS kernels
* Added SFINAE to the `TensorRef(NonConstTensorRef const&)` constructor to avoid making it a copy-constructor for device code
* std => platform
* fix affine2
* really fix affine2
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-02-28 21:34:02 -05:00
Ivan Komarov
e96f00586c
Make cutlass::gemm::device::GemmArray usable ( #295 )
...
* Fix the build of cutlass/gemm/device/gemm_array.h and add a demo for GemmArray
* Add a reference to GemmArray to the docs
Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru>
2022-02-17 20:01:05 -05:00
Jongsoo Park
1db6971a8d
Remove unused gemm_k_iterations in GemmKernel::Params ( #406 )
...
Otherwise we get gemm_k_iterations is uninitialized warnings.
2022-02-16 09:52:45 -05:00
Andrew Kerr
288af365db
Added missing synchronization to avoid WAR hazards between tiles. ( #386 )
2021-12-20 08:34:08 -08:00
Manish Gupta
808c25337a
CUTLASS 2.8 ( #363 )
...
CUTLASS 2.8
2021-11-19 13:26:35 -08:00
Manish Gupta
2e07c4cc2f
CUTLASS 2.7 ( #318 )
...
CUTLASS 2.7
Mainloop fusion for GEMM: summation over A or B
Strided DGRAD (optimized iterators)
Half-precision GELU_taylor activation functions
Use these when accumulation and epilogue compute types are all cutlass::half_t
Tuning and bug fixes to fused GEMM + GEMM example
Support for smaller than 128b aligned Convolutions: see examples
Caching of results to accelerate Convolution unit tests
Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
Corrections and bug fixes reported by the CUTLASS community
Thank you for filing these issues!
authored-by: Haicheng Wu haichengw@nvidia.com , Manish Gupta manigupta@nvidia.com , Dustyn Blasig dblasig@nvidia.com , Andrew Kerr akerr@nvidia.com
2021-09-20 11:02:22 -07:00
Manish Gupta
6c2f8f2fb8
CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning
...
* cutlass 2.6 update
* remove debug prints
* cutlass 2.6.1 (minor update)
* Updated CHANGELOG.
* Minor edit to readme to indicate patch version.
* Minor edit to readme.
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
2021-09-03 10:26:15 -07:00
Manish Gupta
1ac4559d12
Cutlass 2.6 Update 1 ( #301 )
...
* cutlass 2.6 update
* remove debug prints
2021-07-27 17:58:30 -07:00
Manish Gupta
e5d51840e8
CUTLASS 2.6 ( #298 )
...
CUTLASS 2.6
2021-07-23 00:40:53 -04:00
Peter Han
6a6b4028bd
Revert wrong fix of params.update in GemmUniversalBase
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-23 23:20:40 +08:00
Peter Han
92393b2676
Bugfix: memsetAsync uses wrong default stream
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-23 21:11:42 +08:00
Peter Han
169181f30f
Make Shape public from Mma_HFMA2.
...
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-03-04 11:05:16 +08:00
Andrew Kerr
0e13748649
CUTLASS 2.5
2021-02-26 09:58:26 -05:00
Manish Gupta
6615010cd0
CUTLASS 2.4 (Implicit GEMM convolution) ( #147 )
...
CUTLASS 2.4 (Implicit GEMM Convolution)
Co-authored-by: Manish Gupta <manigupta@nvidia.com>, Haicheng Wu <haichengw@nvidia.com>, Dustyn Blasig <dblasig@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
2020-11-19 21:25:25 -08:00
akerr
37a8f9e598
CUTLASS 2.3.0 final.
2020-09-25 10:34:46 -07:00