cutlass

History

Manish Gupta 7d8317a63e Support for Mixed Input TensorOp (#1084 ) * Passing warp-level mixed input F16(S8/U8) tests passing device-level mixed input F16(S8/U8) tests add to profiler - I8 (111 TFLOPs), U (123 TFLOPs) * fast numeric conversions (I8 = 132 TFLOPs, U8 = 148 TFLOPs) * Speedup reference compilation (REVERT THIS COMMIT) * wider_add.u32_packed_sub.f16x2 (I8 = 132TFLOP/s, U8 = 170 TFLOP/s) * Improve s8->f16 cvt and support bf16u8 @158 TFLOPs BF16 * S8 (142 TFLOPs) * Handle mixed-input upcast on OperandA (Support [S8\|U8][F16\|BF16] rename OpMultiplyAddMixedInput to OpMultiplyAddMixedInputUpcast * Add device-level test and profiler support for upcast on operand A * Move shfl before the cvt and reduce #shfls by 1/2 * fix smem_usage calculation for mixed_input types * uncomment the stuff (getting ready for merge) * profiler changes and mixed-input reference * mixed input reference are in a new file * use platform instead of std * comments and typo only * Use CreateGemmOperator and delete CreateMixedInputGemmOperator * copyright for new files * rebase follow-up		2023-09-27 11:18:30 -04:00
..
array.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
bfloat16.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
CMakeLists.txt	Support for Mixed Input TensorOp (#1084 )	2023-09-27 11:18:30 -04:00
complex.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
cpp11.cu	CUTLASS 3.2.1 (#1113 )	2023-09-26 17:24:26 -04:00
fast_numeric_conversion.cu	Support for Mixed Input TensorOp (#1084 )	2023-09-27 11:18:30 -04:00
float8.cu	CUTLASS 3.2 (#1024 )	2023-08-07 20:50:32 -04:00
functional.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
half.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
matrix_coord.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
matrix.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
numeric_conversion.cu	Updates for 3.2 release (#1065 )	2023-08-25 23:05:46 -04:00
predicate_vector.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
quaternion.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
tensor_ref.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
tensor_view.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
test_unit_core.cpp	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00
tfloat32.cu	New updates for 2.11 (#775 )	2023-01-20 16:32:57 -05:00