Jack Kosaian
f29d8f7ca9
Include vector in base_grouped.h ( #618 )
2022-09-06 13:21:23 -04:00
Yujia Zhai
b1d3f9b2fd
upstream internal updates ( #616 )
...
Co-authored-by: yuzhai <yuzhai@nvidia.com>
2022-09-04 23:05:09 -04:00
ANIKET SHIVAM
b72cbf957d
CUTLASS 2.10 ( #615 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-09-03 18:48:46 -04:00
Cliff Burdick
ca23ff7924
Fixed typo in class name ( #608 )
2022-08-29 20:51:52 -04:00
Cliff Burdick
1c3d400b14
Added value_type
trait to complex to make it an easier drop-in replacement for std::complex. ( #607 )
2022-08-28 01:12:40 -04:00
Cliff Burdick
abafbf2afd
Missing comma in trmm header ( #604 )
2022-08-25 16:07:33 -04:00
Cliff Burdick
536b20763e
Fixed typo in profiler README ( #603 )
2022-08-24 21:55:13 -04:00
Haicheng Wu
497b499d9d
Add residual support for shmem staging iterator used in back-to-back GEMM fusion. This allows support of problem_size_0_n that is not multiple of 32. ( #590 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-08-15 11:19:24 -04:00
Jack Kosaian
e66bfcb1f8
Fix for #596 (typo in example 03) ( #597 )
...
* [examples] Fix typos in SYRK and TRMM examples
* Fix typo in example 03
2022-08-09 09:58:36 -04:00
Michaël Benesty
1617685a77
fix: fix types in example 06 ( #587 )
2022-07-29 12:46:06 -04:00
dan_the_3rd
25ebf15d02
Ensure all arch::Mma specializations have ElementC set ( #576 )
...
Co-authored-by: danthe3rd <danthe3rd@users.noreply.github.com>
2022-07-22 23:53:03 -04:00
Shang Zhang
5d05808072
fix gather example ( #574 )
2022-07-19 16:18:17 -04:00
Ivan Komarov
0b8cacd6f1
Remove redundant <fstream> includes ( #563 )
...
* Remove redundant <fstream> includes
* Fix fstream in examples/
* Fix <fstream> in test/
* Use consistent order for <fstream> (always after <iostream>)
* Remove an unneeded include in a file where std::ofstream usage is commented out
Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru>
2022-07-19 15:23:54 -04:00
Haicheng Wu
e7a61c761a
fix race condition when h < stride_h or w < stride_w ( #562 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-07-12 16:37:08 -04:00
seventh
fb379eaa5b
epilogue leaky relu support ScaleType ( #564 )
...
Co-authored-by: xuweiqi <xuweiqi117@gmail.com>
2022-07-11 17:30:55 -04:00
Jacob He
8a766804ad
Fix doc in testbed_gemm_with_broadcast ( #559 )
2022-07-07 09:56:16 -04:00
Bing Xu
1eb6355182
[activation] tanh ( #550 )
...
Co-authored-by: Bing Xu <bingxu@fb.com>
2022-07-02 08:00:45 -04:00
Yujia Zhai
04a9777b87
Softmax ( #546 )
...
* add test layernorm g-mem version
* Delete include/configure directory
* Delete examples/test_layernorm directory
* Update gemm_with_softmax.h
* Update gemm_softmax.cu
* Update linear_combination.h
* Update fast_math.h
* remove redundant vars
Co-authored-by: yujia.zhai <yujia.zhai@bytedance.com>
Co-authored-by: yuzhai <yuzhai@nvidia.com>
2022-07-02 01:19:18 -04:00
Haicheng Wu
e45e773436
Update linear_combination_generic.h ( #472 )
...
add `skip_elementwise_` to support serial splitk in linear_combination_generic.h`
2022-06-28 07:29:38 -04:00
Haicheng Wu
dae6b6893b
Update CHANGELOG.md
2022-06-27 23:30:49 -04:00
Haicheng Wu
ba18ea9c32
Update README.md
2022-06-27 23:25:26 -04:00
Haicheng Wu
9ab9110168
add leaky relu ( #542 )
...
Authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-06-26 10:07:50 -04:00
Jinze (Richard) Xue
e5d4669f16
Update CHANGELOG.md ( #543 )
2022-06-25 13:23:49 -04:00
Haicheng Wu
94f01f19d5
Add implicit gemm perf
...
plot from @manishucsd, presented in gtc'22 cutlass talk
2022-06-23 22:47:11 -04:00
Jack Kosaian
fa56763c25
Fix occupancy calculation for grouped GEMM ( #532 )
2022-06-18 19:53:59 -04:00
LiuWei
25e26a6e51
fix bugs in linear_combination_generic.h missing include cutlass/epilogue/thread/scale_type.h ( #531 )
2022-06-17 23:35:14 -04:00
Haicheng Wu
f248e9bdb4
Create CITATION.cff
...
Add initial CITATION.cff
2022-06-07 21:25:16 -04:00
Pei Sun
dceefe4f64
Increment stride correctly in warp iterator. ( #516 )
...
Co-authored-by: peisun1115 <peis@google.com>
2022-06-06 12:33:36 -04:00
Pei Sun
c3881d097e
Fix a comment about LDSM layout. ( #514 )
...
Co-authored-by: peisun1115 <peis@google.com>
2022-06-04 23:04:00 -04:00
Pei Sun
a29dfb1c63
Fix a bug to increment stride tile correctly ( #503 )
...
* Fix a bug to increment stride tile correctly
* Update regular_tile_access_iterator_tensor_op.h
Co-authored-by: peisun1115 <peis@google.com>
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2022-06-03 22:54:52 -04:00
Jack Kosaian
0abaac84ea
[examples] Fix typos in SYRK and TRMM examples ( #507 )
2022-06-03 22:52:41 -04:00
Haicheng Wu
858c735856
Update gather_scatter_fusion.cu
...
Correct the reference code in gather/scatter example to put bias add in the correct place.
2022-05-18 13:15:25 -04:00
Haicheng Wu
d6f58b2d14
Update functionality.md
2022-05-11 09:34:24 -04:00
Mike Iovine
c4cf0dad82
Fix init-self compiler warnings ( #493 )
...
Fix a few errors caused by trying to initialize a class member
with itself. These errors can turn into errors if you compile
with `-Winit-self`.
2022-05-11 00:35:28 -04:00
Haicheng Wu
57551902d0
Update functionality.md
...
add some explanations to the functionality table.
2022-05-11 00:01:19 -04:00
Haicheng Wu
1604ebaf10
Update generator.py
...
stop generating analytical conv kernels to reduce kernel number
2022-05-08 21:47:15 -04:00
Haicheng Wu
6023038bae
add verification of the reduction tensor ( #489 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-05-06 10:24:51 -07:00
TonyZhao
ddd8f9cf41
update float < int32_t * 4 ( #488 )
...
Co-authored-by: 赵俊涛 <zhaojuntao@zhaojuntaos-MacBook-Pro.local>
2022-05-04 13:36:05 -04:00
Haicheng Wu
ec2b4fd85d
b2b bias vector support ( #482 )
...
* b2b bias vector support
* add files
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-04-30 04:16:15 -07:00
Stepan Tezyunichev
86ce09aed1
2.9 fixes for nvrtc ( #480 )
...
* Use platform::is_same instead of std::is_same
* Don't hide cuComplex include from nvrtc
* Typo fixed
* Remove comment rename
2022-04-29 09:06:52 -04:00
Haicheng Wu
21c1fa3849
add .github ( #479 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-04-28 12:36:59 -07:00
Janusz Lisiecki
8c339ac039
Fix compilation in clang ( #478 )
...
- adds missing commas
- adjusts misaligned usage of CUTLASS_DEVICE between
template declaration and specializations
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
2022-04-28 14:22:06 -04:00
Haicheng Wu
e49f690fd7
Update linear_combination_generic.h
2022-04-28 14:04:53 -04:00
Haicheng Wu
96dad61a75
Update CHANGELOG.md
2022-04-28 10:52:10 -04:00
Haicheng Wu
cc2ea4c3fc
Update README.md
2022-04-28 10:50:11 -04:00
Andrew Kerr
a0de301283
Used relative paths for includes ( #477 )
2022-04-27 12:04:23 -07:00
Haicheng Wu
319a389f42
Update CMakeLists.txt ( #473 )
...
* Update CMakeLists.txt
Add 128bit int support if using nvc++ to solve #310
@jeffhammond, would you please give it a try?
* Update CMakeLists.txt
correct copy paste error
2022-04-27 07:02:26 -07:00
Stepan Tezyunichev
71def2f084
Use platform:: instead of std::abs and std::conditional ( #452 )
...
* Fixed template struct/class mismatch
* Use platform implementation instead of std::abs and std::conditional during nvrtc compilation
* Use platform implementation instead of std::abs and std::conditional during nvrtc compilation
* Revert absolute_value() usage
2022-04-25 14:40:22 -04:00
Masahiro Masuda
70f3ba57f5
Fix typo in shared memory layout description ( #471 )
2022-04-24 18:32:13 -04:00
Fujun Han
dd77fadc70
Remove redundant offset def and init in shared_load_iterator.h ( #456 )
...
Signed-off-by: Fujun Han <fujun.han@iluvatar.ai>
2022-04-24 16:31:00 -04:00