seventh
168ea8b0e1
ensure singleton::get thread safe construct instance ( #658 )
...
* ensure singleton::get thread safe construct instance
* fix singleton return reference
Co-authored-by: xuweiqi <xuweiqi117@gmail.com>
2022-11-08 21:44:32 -05:00
Haicheng Wu
012c62c748
bug fixes and enharcement to gemm reductionK fusion ( #682 )
...
* add two missing files
* fix bunch of bugs of gemm-reducek fusion and add a device interface
* small changes
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-11-03 11:07:50 -04:00
FZC
cc85b64cf6
fix typo ( #677 )
2022-11-01 14:07:33 -04:00
dan_the_3rd
1b4e24470a
Example 43 - DualGemm ( #670 )
...
* Ex50 wip
* IS_PROFILING mode
* MultiStage2 - but is slower
* Add SwiGLU
* Support SplitKSerial reduction
Support not storing D0/D1
Cleanup code
* Option to disable bias
* Renumber example
* Fix build
* Remove references to pb_size_0 / pb_size_1
* Add support for bf16 inputs with float accum
* small changes
Co-authored-by: danthe3rd <danthe3rd>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-10-26 14:04:42 -04:00
Jack Kosaian
8c1bf9b784
Bump CUTLASS Python container version ( #672 )
...
* Update example 40 README
* Update CUTLASS Python README
2022-10-22 21:09:39 -04:00
Yuriy Chernyshov
7d0dd6706e
Remove excessive includes from examples/41_multi_head_attention ( #669 )
...
The rationale behind this change is explained in #563
2022-10-21 22:23:15 -04:00
hlu1
9b47403b2d
Add missing CUTLASS_HOST_DEVICE ( #671 )
2022-10-21 22:20:38 -04:00
dan_the_3rd
4db6a6140e
ex42: Fused MHA imported from xFormers ( #662 )
...
* ex42: Fused MHA imported from xFormers
* Remove std:: references
* Support K>128 in the example
* Support causal option
* Support different head size for V, and different seqlength for KV
* Update FLOPS counter
* Remove bit_cast
* fix build: Replace M_LOG2E
* Add doc
* Revert "Remove bit_cast"
This reverts commit 9662fa86bb7c57c1a015ac0bf52cb52940fbbf80.
* Explicit casts to int32_t for windows build
Co-authored-by: danthe3rd <danthe3rd>
2022-10-17 10:49:33 -04:00
Matthew Nicely
3bf95e90c2
Update labeler.yml
2022-10-13 08:03:28 -04:00
Matthew Nicely
75fed7493e
Update labeler.yml
2022-10-13 08:01:21 -04:00
Matthew Nicely
98b73fc95d
Update labeler.yml
2022-10-13 07:55:33 -04:00
Matthew Nicely
4990e3686d
Update labeler.yml
2022-10-13 07:52:38 -04:00
Matthew Nicely
4b7365388c
Update labeler.yml
2022-10-13 07:32:55 -04:00
Matthew Nicely
0d8405588d
Update labeler.yml
2022-10-12 15:32:38 -04:00
Alexander Freudenberg
cb539dab78
Correct typos in comments ( #639 )
...
* Correct typos in comments
Correct comments in code on type of generated distribution. Improve Gaussian RNG to take advantage of Box Muller method
* Inline Box Muller
Added inline function for the Box Muller algorithm and updated code comments to be more concise
* Update tensor_fill.h
* Update tensor_fill.h
* small changes to pass tests
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-09-30 22:51:30 -04:00
Ying Zhang
dadc881a96
Bug fix for gemm broadcast ( #650 )
...
* gemm_universal_with_broadcast, +2 sources.
* Revert "gemm_universal_with_broadcast, +2 sources."
This reverts commit fb063251f2144a091f12c9abfce7e1713f2d1c9e.
* gemm broadcast bug fix
2022-09-30 10:00:38 -04:00
Matthew Nicely
f3eea3a4d7
Create labeler.yml
2022-09-29 15:08:44 -04:00
Wenzhuo Liu
cd37e82492
change unused class member to local var ( #646 )
2022-09-28 23:52:35 -04:00
ANIKET SHIVAM
48a9ea223a
Fix release version in the citation ( #638 )
2022-09-22 10:58:45 -04:00
Wenzhuo Liu
7a458f00a6
fix(permute.h): incorrect comment in Tensor5DPermute20314
( #637 )
...
* fix(permute.h): incorrect comment in `Tensor5DPermute20314`
* typo in usage in example 39
2022-09-22 09:21:13 -04:00
Haicheng Wu
97bff52e8c
add two missing files ( #636 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-09-21 15:42:42 -04:00
Tianqi Zhang (张天启)
9f2e3faa69
fix call of GELU_Taylor in LinearCombinationGeneric ( #634 )
2022-09-20 21:00:55 -04:00
Ying Zhang
a821280dc7
Gemm broadcast ( #632 )
...
* gemm_universal_with_broadcast, +2 sources.
* Revert "gemm_universal_with_broadcast, +2 sources."
This reverts commit fb063251f2144a091f12c9abfce7e1713f2d1c9e.
* gemm_universal_with_broadcast separated version.
* Update copyright banner.
* update banner
2022-09-20 10:37:12 -04:00
Wenzhuo Liu
f73374a1eb
fix:comment typo in example 23 ( #633 )
2022-09-19 09:54:14 -04:00
Yujia Zhai
faab7536fc
add comment ( #628 )
2022-09-17 21:40:30 -04:00
Andrew Kerr
fc9ebc645b
CUTLASS 2.10 bug fixes and minor updates. ( #626 )
2022-09-15 16:20:33 -04:00
alexfreudenberg
2cc2c7ba1f
Add set_k_partition function ( #624 )
...
A member function set_k_partition is required for the instatiation of cutlass::gemm::kernel::Gemm, even though SplitKSerial is false
2022-09-13 22:34:20 -04:00
ANIKET SHIVAM
50ceed7154
Minor README fix ( #623 )
...
* minor fix
* Minor fix
2022-09-12 22:40:25 -04:00
ANIKET SHIVAM
e773429f7e
CUTLASS 2.10 updates ( #622 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-09-12 21:26:30 -04:00
Yujia Zhai
beae168f90
fix broken link ( #620 )
...
Co-authored-by: yuzhai <yuzhai@nvidia.com>
2022-09-06 16:32:44 -04:00
Jack Kosaian
f29d8f7ca9
Include vector in base_grouped.h ( #618 )
2022-09-06 13:21:23 -04:00
Yujia Zhai
b1d3f9b2fd
upstream internal updates ( #616 )
...
Co-authored-by: yuzhai <yuzhai@nvidia.com>
2022-09-04 23:05:09 -04:00
ANIKET SHIVAM
b72cbf957d
CUTLASS 2.10 ( #615 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-09-03 18:48:46 -04:00
Cliff Burdick
ca23ff7924
Fixed typo in class name ( #608 )
2022-08-29 20:51:52 -04:00
Cliff Burdick
1c3d400b14
Added value_type
trait to complex to make it an easier drop-in replacement for std::complex. ( #607 )
2022-08-28 01:12:40 -04:00
Cliff Burdick
abafbf2afd
Missing comma in trmm header ( #604 )
2022-08-25 16:07:33 -04:00
Cliff Burdick
536b20763e
Fixed typo in profiler README ( #603 )
2022-08-24 21:55:13 -04:00
Haicheng Wu
497b499d9d
Add residual support for shmem staging iterator used in back-to-back GEMM fusion. This allows support of problem_size_0_n that is not multiple of 32. ( #590 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-08-15 11:19:24 -04:00
Jack Kosaian
e66bfcb1f8
Fix for #596 (typo in example 03) ( #597 )
...
* [examples] Fix typos in SYRK and TRMM examples
* Fix typo in example 03
2022-08-09 09:58:36 -04:00
Michaël Benesty
1617685a77
fix: fix types in example 06 ( #587 )
2022-07-29 12:46:06 -04:00
dan_the_3rd
25ebf15d02
Ensure all arch::Mma specializations have ElementC set ( #576 )
...
Co-authored-by: danthe3rd <danthe3rd@users.noreply.github.com>
2022-07-22 23:53:03 -04:00
Shang Zhang
5d05808072
fix gather example ( #574 )
2022-07-19 16:18:17 -04:00
Ivan Komarov
0b8cacd6f1
Remove redundant <fstream> includes ( #563 )
...
* Remove redundant <fstream> includes
* Fix fstream in examples/
* Fix <fstream> in test/
* Use consistent order for <fstream> (always after <iostream>)
* Remove an unneeded include in a file where std::ofstream usage is commented out
Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru>
2022-07-19 15:23:54 -04:00
Haicheng Wu
e7a61c761a
fix race condition when h < stride_h or w < stride_w ( #562 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-07-12 16:37:08 -04:00
seventh
fb379eaa5b
epilogue leaky relu support ScaleType ( #564 )
...
Co-authored-by: xuweiqi <xuweiqi117@gmail.com>
2022-07-11 17:30:55 -04:00
Jacob He
8a766804ad
Fix doc in testbed_gemm_with_broadcast ( #559 )
2022-07-07 09:56:16 -04:00
Bing Xu
1eb6355182
[activation] tanh ( #550 )
...
Co-authored-by: Bing Xu <bingxu@fb.com>
2022-07-02 08:00:45 -04:00
Yujia Zhai
04a9777b87
Softmax ( #546 )
...
* add test layernorm g-mem version
* Delete include/configure directory
* Delete examples/test_layernorm directory
* Update gemm_with_softmax.h
* Update gemm_softmax.cu
* Update linear_combination.h
* Update fast_math.h
* remove redundant vars
Co-authored-by: yujia.zhai <yujia.zhai@bytedance.com>
Co-authored-by: yuzhai <yuzhai@nvidia.com>
2022-07-02 01:19:18 -04:00
Haicheng Wu
e45e773436
Update linear_combination_generic.h ( #472 )
...
add `skip_elementwise_` to support serial splitk in linear_combination_generic.h`
2022-06-28 07:29:38 -04:00
Haicheng Wu
dae6b6893b
Update CHANGELOG.md
2022-06-27 23:30:49 -04:00