Emitters#
Common#
Common utilities for emitting CUTLASS kernels
PyTorch#
Utilities for generating source for building a PyTorch CUDA extension that using a CUTLASS kernel.
If specified, the extension can be JIT compiled via PyTorch’s cpp_extension.load method.
Example usage with JIT compilation:
plan = cutlass.op.Gemm(element=torch.float32, layout=cutlass.LayoutType.RowMajor)
op = plan.construct()
mod = cutlass.emit.pytorch(op, 'cutlass_gemm', 80, jit=True)
# Generate inputs for the GEMM
A, B, C = [torch.ones((512, 512)).to('cuda') for _ in range(3)]
# Run the module
D = mod.run(A, B, C)
Example usage without JIT compilation:
plan = cutlass.op.Gemm(element=torch.float32, layout=cutlass.LayoutType.RowMajor)
op = plan.construct()
cutlass.emit.pytorch(op, 'cutlass_gemm', 80, jit=False, sourcedir='output')
After this call, the directory output contains setup.py,
cutlass_gemm.cpp, and cutlass_gemm_kernel.cu. The module can be built from
within output by running: TORCH_CUDA_ARCH_LIST="8.0" python setup.py develop --user.
The module can later be used in Python via:
import torch
import cutlass_gemm
# Generate inputs for the GEMM
A, B, C = [torch.ones((512, 512)).to('cuda') for _ in range(3)]
# Run the module
D = cutlass_gemm.run(A, B, C)
- cutlass.emit.pytorch.pytorch(op, name, cc, jit=False, sourcedir='')[source]#
- Generates source for building a PyTorch CUDA module that leverages the CUTLASS kernel specified by - op. If the- jitparameter is set to true, the module is just-in-time compiled, loaded, and returned.- The result of this method is files within - sourcedirthat can be used for building a PyTorch module.- Parameters:
- op – operation to emit in the module 
- name (str) – name of the module to generate 
- cc (int) – compute capability of the device the module should target 
- jit (bool) – whether the module should be just-in-time compiled 
- sourcedir (str) – directory to which generated source files should be written 
 
- Returns:
- loaded PyTorch module (if - jit=True) or None