cutlass/examples/python/01_epilogue.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5d24a692",
   "metadata": {},
   "source": [
    "# Example of using elementwise activation functions in the CUTLASS Python interface\n",
    "This notebook walks through a basic example of using the CUTLASS Python interface to declare, compile, and run GEMMs with different epilogues.\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVIDIA/cutlass/tree/master/examples/python/01_epilogue.ipynb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28c916da",
   "metadata": {},
   "source": [
    "## Prerequisites for running on Colab\n",
    "This notebook requires an NVIDIA GPU. If `nvidia-smi` fails, go to Runtime -> Change runtime type -> Hardware accelerator and confirm a GPU is selected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0fcea8ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "!#nvidia-smi"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ec60b57",
   "metadata": {},
   "source": [
    "If running on Colab, you will need to install the CUTLASS Python interface. To do so, uncomment the following line and run the cell:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1db9e51c",
   "metadata": {},
   "outputs": [],
   "source": [
    "!#pip install nvidia-cutlass"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "962324fd",
   "metadata": {},
   "source": [
    "## General setup\n",
    "We first import various packages needed for the example and construct the input and output tensors that will be used in our example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "63a70a3c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "import cutlass\n",
    "\n",
    "# This controls whether ther C++ GEMM declaration will be printed at each step. Set to `false` to\n",
    "# omit this information.\n",
    "print_module = True\n",
    "\n",
    "m = 256\n",
    "n = m\n",
    "k = m\n",
    "\n",
    "type_A = np.float16\n",
    "type_B = np.float16\n",
    "type_C = np.float16\n",
    "type_D = np.float16\n",
    "\n",
    "np.random.seed(1234)\n",
    "scope_min = -4\n",
    "scope_max = 4\n",
    "tensor_A = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, k)).astype(type_A))\n",
    "tensor_B = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(k, n)).astype(type_B))\n",
    "tensor_C = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, n)).astype(type_C))\n",
    "\n",
    "alpha = np.float16(1.)\n",
    "beta = np.float16(0.)\n",
    "\n",
    "tensor_D = np.zeros(tensor_C.shape).astype(type_D)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1eb0d95b",
   "metadata": {},
   "source": [
    "## Run a GEMM with an identity activation function\n",
    "To begin, we simply run a default GEMM with an identity activation function. This performs the well-known operation `D = alpha * (A @ B) + beta * C`. This is the default activation function used, and does not need to be specified."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d257833",
   "metadata": {},
   "outputs": [],
   "source": [
    "plan = cutlass.op.Gemm(element=np.float16, layout=cutlass.LayoutType.RowMajor)\n",
    "plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54961694",
   "metadata": {},
   "source": [
    "## Run a GEMM with a ReLU element-wise activation function\n",
    "CUTLASS makes it easy to support other element-wise activation functions. This results in performing an element-wise after the generic linear combination performed in a GEMM. If we call such an activation function `act`, the resulting formulation is:\n",
    "```\n",
    "D = alpha * (A @ B) + beta * C\n",
    "D = act(D)\n",
    "```\n",
    "\n",
    "Here, we will add a ReLU activation function. Given an input `x`, ReLU returns `max(x, 0)`.\n",
    "\n",
    "This is easy to do in CUTLASS. One only needs to set the plan's `activation` field."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5fe49443",
   "metadata": {},
   "outputs": [],
   "source": [
    "tensor_D_relu = np.zeros(tensor_C.shape).astype(type_D)\n",
    "plan.activation = \"relu\"\n",
    "plan.run(tensor_A, tensor_B, tensor_C, tensor_D_relu, print_module=print_module)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "455d0a37",
   "metadata": {},
   "source": [
    "We can now verify that the result of the GEMM that used a ReLU activation function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e32e7798",
   "metadata": {},
   "outputs": [],
   "source": [
    "relu_ref = (tensor_D >= 0).astype(type_D) * tensor_D\n",
    "np.testing.assert_array_equal(relu_ref, tensor_D_relu)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf959171",
   "metadata": {},
   "source": [
    "## Other element-wise activation functions\n",
    "CUTLASS supports a variety of widely-used element-wise activation functions. We can obtain a list of these functions via the `get_activations()` method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e17d730",
   "metadata": {},
   "outputs": [],
   "source": [
    "activations = plan.activations()\n",
    "for activation in activations:\n",
    "    print(activation)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0e4599fa",
   "metadata": {},
   "source": [
    "We can then run each of them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9c3598c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "for activation in activations:\n",
    "    print('=============================================================================================')\n",
    "    print(f'Compiling and running activation {activation}')\n",
    "    print('=============================================================================================')\n",
    "    plan.activation = activation\n",
    "    plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18828622",
   "metadata": {},
   "source": [
    "To add an activation with parameter such as `leaky_relu`, a tuple should be provided containing the activation function name and the (or a list of) parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53108eae",
   "metadata": {},
   "outputs": [],
   "source": [
    "negative_slope = 0.5\n",
    "plan.activation = (\"leaky_relu\", negative_slope)\n",
    "plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
CUTLASS 3.1 (#915) Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 2023-04-15 11:19:34 +08:00			`{`
			`"cells": [`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"id": "5d24a692",`
			`"metadata": {},`
			`"source": [`
			`"# Example of using elementwise activation functions in the CUTLASS Python interface\n",`
Updates to Python interface for PyPI packaging (#1209) * Updates * Updates to notebooks 2023-11-29 02:52:12 +08:00			`"This notebook walks through a basic example of using the CUTLASS Python interface to declare, compile, and run GEMMs with different epilogues.\n",`
			`"\n",`
			`"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVIDIA/cutlass/tree/master/examples/python/01_epilogue.ipynb)\n"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "28c916da",`
			`"metadata": {},`
			`"source": [`
			`"## Prerequisites for running on Colab\n",`
			"This notebook requires an NVIDIA GPU. If `nvidia-smi` fails, go to Runtime -> Change runtime type -> Hardware accelerator and confirm a GPU is selected."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "0fcea8ea",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"!#nvidia-smi"`
CUTLASS 3.1 (#915) Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 2023-04-15 11:19:34 +08:00			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
Updates to Python interface for PyPI packaging (#1209) * Updates * Updates to notebooks 2023-11-29 02:52:12 +08:00			`"id": "7ec60b57",`
			`"metadata": {},`
			`"source": [`
			`"If running on Colab, you will need to install the CUTLASS Python interface. To do so, uncomment the following line and run the cell:"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "1db9e51c",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"!#pip install nvidia-cutlass"`
			`]`
			`},`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"id": "962324fd",`
CUTLASS 3.1 (#915) Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 2023-04-15 11:19:34 +08:00			`"metadata": {},`
			`"source": [`
Updates to Python interface for PyPI packaging (#1209) * Updates * Updates to notebooks 2023-11-29 02:52:12 +08:00			`"## General setup\n",`
CUTLASS 3.1 (#915) Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 2023-04-15 11:19:34 +08:00			`"We first import various packages needed for the example and construct the input and output tensors that will be used in our example."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "63a70a3c",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"import numpy as np\n",`
			`"\n",`
			`"import cutlass\n",`
			`"\n",`
			"# This controls whether ther C++ GEMM declaration will be printed at each step. Set to `false` to\n",
			`"# omit this information.\n",`
			`"print_module = True\n",`
			`"\n",`
			`"m = 256\n",`
			`"n = m\n",`
			`"k = m\n",`
			`"\n",`
			`"type_A = np.float16\n",`
			`"type_B = np.float16\n",`
			`"type_C = np.float16\n",`
			`"type_D = np.float16\n",`
			`"\n",`
			`"np.random.seed(1234)\n",`
			`"scope_min = -4\n",`
			`"scope_max = 4\n",`
			`"tensor_A = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, k)).astype(type_A))\n",`
			`"tensor_B = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(k, n)).astype(type_B))\n",`
			`"tensor_C = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, n)).astype(type_C))\n",`
			`"\n",`
			`"alpha = np.float16(1.)\n",`
			`"beta = np.float16(0.)\n",`
			`"\n",`
			`"tensor_D = np.zeros(tensor_C.shape).astype(type_D)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "1eb0d95b",`
			`"metadata": {},`
			`"source": [`
			`"## Run a GEMM with an identity activation function\n",`
			"To begin, we simply run a default GEMM with an identity activation function. This performs the well-known operation `D = alpha * (A @ B) + beta * C`. This is the default activation function used, and does not need to be specified."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "8d257833",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"plan = cutlass.op.Gemm(element=np.float16, layout=cutlass.LayoutType.RowMajor)\n",`
			`"plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "54961694",`
			`"metadata": {},`
			`"source": [`
			`"## Run a GEMM with a ReLU element-wise activation function\n",`
			"CUTLASS makes it easy to support other element-wise activation functions. This results in performing an element-wise after the generic linear combination performed in a GEMM. If we call such an activation function `act`, the resulting formulation is:\n",
			"```\n",
			`"D = alpha * (A @ B) + beta * C\n",`
			`"D = act(D)\n",`
			"```\n",
			`"\n",`
			"Here, we will add a ReLU activation function. Given an input `x`, ReLU returns `max(x, 0)`.\n",
			`"\n",`
			"This is easy to do in CUTLASS. One only needs to set the plan's `activation` field."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "5fe49443",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"tensor_D_relu = np.zeros(tensor_C.shape).astype(type_D)\n",`
CUTLASS 3.2 (#1024) * CUTLASS 3.2 2023-08-08 08:50:32 +08:00			`"plan.activation = \"relu\"\n",`
CUTLASS 3.1 (#915) Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 2023-04-15 11:19:34 +08:00			`"plan.run(tensor_A, tensor_B, tensor_C, tensor_D_relu, print_module=print_module)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "455d0a37",`
			`"metadata": {},`
			`"source": [`
			`"We can now verify that the result of the GEMM that used a ReLU activation function:"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "e32e7798",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"relu_ref = (tensor_D >= 0).astype(type_D) * tensor_D\n",`
			`"np.testing.assert_array_equal(relu_ref, tensor_D_relu)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "cf959171",`
			`"metadata": {},`
			`"source": [`
			`"## Other element-wise activation functions\n",`
			"CUTLASS supports a variety of widely-used element-wise activation functions. We can obtain a list of these functions via the `get_activations()` method."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "9e17d730",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"activations = plan.activations()\n",`
			`"for activation in activations:\n",`
			`" print(activation)"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "0e4599fa",`
			`"metadata": {},`
			`"source": [`
			`"We can then run each of them:"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "9c3598c9",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"for activation in activations:\n",`
			`" print('=============================================================================================')\n",`
			`" print(f'Compiling and running activation {activation}')\n",`
			`" print('=============================================================================================')\n",`
			`" plan.activation = activation\n",`
			`" plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"`
			`]`
			`},`
CUTLASS 3.2 (#1024) * CUTLASS 3.2 2023-08-08 08:50:32 +08:00			`{`
			`"cell_type": "markdown",`
			`"id": "18828622",`
			`"metadata": {},`
			`"source": [`
			"To add an activation with parameter such as `leaky_relu`, a tuple should be provided containing the activation function name and the (or a list of) parameter."
			`]`
			`},`
CUTLASS 3.1 (#915) Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 2023-04-15 11:19:34 +08:00			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
CUTLASS 3.2 (#1024) * CUTLASS 3.2 2023-08-08 08:50:32 +08:00			`"id": "53108eae",`
CUTLASS 3.1 (#915) Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 2023-04-15 11:19:34 +08:00			`"metadata": {},`
			`"outputs": [],`
CUTLASS 3.2 (#1024) * CUTLASS 3.2 2023-08-08 08:50:32 +08:00			`"source": [`
			`"negative_slope = 0.5\n",`
			`"plan.activation = (\"leaky_relu\", negative_slope)\n",`
			`"plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"`
			`]`
CUTLASS 3.1 (#915) Co-authored-by: Aniket Shivam <ashivam@nvidia.com> 2023-04-15 11:19:34 +08:00			`}`
			`],`
			`"metadata": {`
			`"kernelspec": {`
			`"display_name": "Python 3 (ipykernel)",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.8.10"`
			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 5`
			`}`