Update README.md
This commit is contained in:
		
							parent
							
								
									0428c89fd5
								
							
						
					
					
						commit
						6f091f5620
					
				| @ -20,7 +20,7 @@ point (FP64) types.  Furthermore, CUTLASS demonstrates CUDA's WMMA API for targe | ||||
| the programmable, high-throughput _Tensor Cores_ provided by NVIDIA's Volta architecture  | ||||
| and beyond. | ||||
| 
 | ||||
|  | ||||
|  | ||||
| 
 | ||||
| CUTLASS is very efficient, with performance comparable to cuBLAS for scalar GEMM  | ||||
| computations. The above figure shows CUTLASS performance relative to cuBLAS  | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user
	 Duane Merrill
						Duane Merrill