Commit Graph

553 Commits

Author SHA1 Message Date
Ivan Komarov
0b8cacd6f1
Remove redundant <fstream> includes (#563)
* Remove redundant <fstream> includes

* Fix fstream in examples/

* Fix <fstream> in test/

* Use consistent order for <fstream> (always after <iostream>)

* Remove an unneeded include in a file where std::ofstream usage is commented out

Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru>
2022-07-19 15:23:54 -04:00
Haicheng Wu
e7a61c761a
fix race condition when h < stride_h or w < stride_w (#562)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-07-12 16:37:08 -04:00
seventh
fb379eaa5b
epilogue leaky relu support ScaleType (#564)
Co-authored-by: xuweiqi <xuweiqi117@gmail.com>
2022-07-11 17:30:55 -04:00
Jacob He
8a766804ad
Fix doc in testbed_gemm_with_broadcast (#559) 2022-07-07 09:56:16 -04:00
Bing Xu
1eb6355182
[activation] tanh (#550)
Co-authored-by: Bing Xu <bingxu@fb.com>
2022-07-02 08:00:45 -04:00
Yujia Zhai
04a9777b87
Softmax (#546)
* add test layernorm g-mem version

* Delete include/configure directory

* Delete examples/test_layernorm directory

* Update gemm_with_softmax.h

* Update gemm_softmax.cu

* Update linear_combination.h

* Update fast_math.h

* remove redundant vars

Co-authored-by: yujia.zhai <yujia.zhai@bytedance.com>
Co-authored-by: yuzhai <yuzhai@nvidia.com>
2022-07-02 01:19:18 -04:00
Haicheng Wu
e45e773436
Update linear_combination_generic.h (#472)
add `skip_elementwise_` to support serial splitk in linear_combination_generic.h`
2022-06-28 07:29:38 -04:00
Haicheng Wu
dae6b6893b
Update CHANGELOG.md 2022-06-27 23:30:49 -04:00
Haicheng Wu
ba18ea9c32
Update README.md 2022-06-27 23:25:26 -04:00
Haicheng Wu
9ab9110168
add leaky relu (#542)
Authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-06-26 10:07:50 -04:00
Jinze (Richard) Xue
e5d4669f16
Update CHANGELOG.md (#543) 2022-06-25 13:23:49 -04:00
Haicheng Wu
94f01f19d5
Add implicit gemm perf
plot from @manishucsd, presented in gtc'22 cutlass talk
2022-06-23 22:47:11 -04:00
Jack Kosaian
fa56763c25
Fix occupancy calculation for grouped GEMM (#532) 2022-06-18 19:53:59 -04:00
LiuWei
25e26a6e51
fix bugs in linear_combination_generic.h missing include cutlass/epilogue/thread/scale_type.h (#531) 2022-06-17 23:35:14 -04:00
Haicheng Wu
f248e9bdb4
Create CITATION.cff
Add initial CITATION.cff
2022-06-07 21:25:16 -04:00
Pei Sun
dceefe4f64
Increment stride correctly in warp iterator. (#516)
Co-authored-by: peisun1115 <peis@google.com>
2022-06-06 12:33:36 -04:00
Pei Sun
c3881d097e
Fix a comment about LDSM layout. (#514)
Co-authored-by: peisun1115 <peis@google.com>
2022-06-04 23:04:00 -04:00
Pei Sun
a29dfb1c63
Fix a bug to increment stride tile correctly (#503)
* Fix a bug to increment stride tile correctly

* Update regular_tile_access_iterator_tensor_op.h

Co-authored-by: peisun1115 <peis@google.com>
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2022-06-03 22:54:52 -04:00
Jack Kosaian
0abaac84ea
[examples] Fix typos in SYRK and TRMM examples (#507) 2022-06-03 22:52:41 -04:00
Haicheng Wu
858c735856
Update gather_scatter_fusion.cu
Correct the reference code in gather/scatter example to put bias add in the correct place.
2022-05-18 13:15:25 -04:00
Haicheng Wu
d6f58b2d14
Update functionality.md 2022-05-11 09:34:24 -04:00
Mike Iovine
c4cf0dad82
Fix init-self compiler warnings (#493)
Fix a few errors caused by trying to initialize a class member
with itself. These errors can turn into errors if you compile
with `-Winit-self`.
2022-05-11 00:35:28 -04:00
Haicheng Wu
57551902d0
Update functionality.md
add some explanations to the functionality table.
2022-05-11 00:01:19 -04:00
Haicheng Wu
1604ebaf10
Update generator.py
stop generating analytical conv kernels to reduce kernel number
2022-05-08 21:47:15 -04:00
Haicheng Wu
6023038bae
add verification of the reduction tensor (#489)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-05-06 10:24:51 -07:00
TonyZhao
ddd8f9cf41
update float < int32_t * 4 (#488)
Co-authored-by: 赵俊涛 <zhaojuntao@zhaojuntaos-MacBook-Pro.local>
2022-05-04 13:36:05 -04:00
Haicheng Wu
ec2b4fd85d
b2b bias vector support (#482)
* b2b bias vector support

* add files

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-04-30 04:16:15 -07:00
Stepan Tezyunichev
86ce09aed1
2.9 fixes for nvrtc (#480)
* Use platform::is_same instead of std::is_same

* Don't hide cuComplex include from nvrtc

* Typo fixed

* Remove comment rename
2022-04-29 09:06:52 -04:00
Haicheng Wu
21c1fa3849
add .github (#479)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-04-28 12:36:59 -07:00
Janusz Lisiecki
8c339ac039
Fix compilation in clang (#478)
- adds missing commas
- adjusts misaligned usage of CUTLASS_DEVICE between
  template declaration and specializations

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
2022-04-28 14:22:06 -04:00
Haicheng Wu
e49f690fd7
Update linear_combination_generic.h 2022-04-28 14:04:53 -04:00
Haicheng Wu
96dad61a75
Update CHANGELOG.md 2022-04-28 10:52:10 -04:00
Haicheng Wu
cc2ea4c3fc
Update README.md 2022-04-28 10:50:11 -04:00
Andrew Kerr
a0de301283
Used relative paths for includes (#477) 2022-04-27 12:04:23 -07:00
Haicheng Wu
319a389f42
Update CMakeLists.txt (#473)
* Update CMakeLists.txt

Add 128bit int support if using nvc++ to solve #310 

@jeffhammond, would you please give it a try?

* Update CMakeLists.txt

correct copy paste error
2022-04-27 07:02:26 -07:00
Stepan Tezyunichev
71def2f084
Use platform:: instead of std::abs and std::conditional (#452)
* Fixed template struct/class mismatch

* Use platform implementation instead of std::abs and std::conditional during nvrtc compilation

* Use platform implementation instead of std::abs and std::conditional during nvrtc compilation

* Revert absolute_value() usage
2022-04-25 14:40:22 -04:00
Masahiro Masuda
70f3ba57f5
Fix typo in shared memory layout description (#471) 2022-04-24 18:32:13 -04:00
Fujun Han
dd77fadc70
Remove redundant offset def and init in shared_load_iterator.h (#456)
Signed-off-by: Fujun Han <fujun.han@iluvatar.ai>
2022-04-24 16:31:00 -04:00
Stepan Tezyunichev
be4578d517
Fixed template struct/class mismatch (#453) 2022-04-24 16:30:21 -04:00
Andrei Alexandrescu
d7b499deff
Fix CUDA_PERROR_EXIT and print failing expression (#446)
`CUDA_PERROR_EXIT ` can lead to incorrect usage (see e.g. [this description](https://www.cs.technion.ac.il/users/yechiel/c++-faq/macros-with-if.html)) because it contains an incomplete `if` expression. Consider:

```
if (condition)
    CUDA_PERROR_EXIT(cudaFree(x))
else
    free(x);
```

The author of the code forgot to add a semicolon after the macro. In that case, the `else` will bind to the `if` inside the macro definition, leading to code that the author did not intend or expect. It the author does use a semicolon, the code will not compile, which is awkward.

The change adds a `do while` around the `if`, which always requires a semicolon.

This PR also adds the text of the failing expression to the printed error message.
2022-04-24 16:29:43 -04:00
Exusial
310ed81ac3
fix description in example 12. (#444)
Co-authored-by: Exusial <Exusial>
2022-04-24 16:29:06 -04:00
Fujun Han
4c0d6e1eb4
[BUGFIX]: Force unroll a loop that doesn't have compilation constant (#441)
loop times is dangerous.

Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2022-04-24 16:28:32 -04:00
Jack Kosaian
167ac54c65
Fix link to Python example (#469) 2022-04-23 15:37:38 -04:00
Andrew Kerr
12f4108ac2
CUTLASS 2.9 (#468) 2022-04-23 15:02:38 -04:00
Feng Shijie
dd571f0edb
[style] fix code indentation (#449)
* [docs] fix typo in media/docs/layout.md

* [docs] fix comment error

* fix typo in include/cutlass/arch/simd_61.h

* fix stride comment errors in TensorLayout

* fix indentation
2022-04-03 21:13:17 -04:00
Jianyu Huang
6d0d265047
Update PUBLICATIONS.md (#447) 2022-04-03 21:03:28 -04:00
Haicheng Wu
f11fa975a5
Update PUBLICATIONS.md
@tsuki
2022-03-23 21:04:43 -04:00
Masahiro Masuda
0e71d9b450
Transposed conv2d and wgrad split k examples (#413)
* add split k wgrad example

* wgrad done

* begin transposed conv2d example

* update transposed conv2d example and add ref check

* update doc for conv2d transpose example

* add license

* add wgrad doc

* more clarification on GEMM output type

* typo fix

* clean up indent

* address comments

* rename example numbers to 34 and 35

* GEMM -> Implicit GEMM

* Revert "rename example numbers to 34 and 35"

This reverts commit 551a808c227216e9e38d4472ba8ff020557b8500.

* transposed_conv2d is 34

* add compiler and device version check to exit gracefully

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-03-23 14:52:54 -04:00
Minmin Sun (孙敏敏)
eb0d4c9213
[library] pass pointer of arguments to get_host_workspace_size() in gemm_universal() (#412)
Otherwise GemmUniversalOperation::get_host_workspace_size() will fail on SegmentFault.
2022-03-22 12:36:34 -04:00
Haojin Yang
bc45e2c023
fixed datatype error of numeric_limit for uint1b_t (#419)
Co-authored-by: Haojin Yang <haojin.yang@.hpi.uni-potsdam.de>
2022-03-22 12:30:30 -04:00