Updated GEMM performance plot with CUTLASS 2.8 compiled with CUDA 11.5 Toolkit (#375)
Updated GEMM performance plot with CUTLASS 2.8 compiled using CUDA 11.5 Toolkit.
GPUs under test:
    NVIDIA A100
    NVIDIA A2
    NVIDIA TitanV
    NVIDIA GeForce 2080 Ti
			
			
This commit is contained in:
		
							parent
							
								
									6b69c79ac3
								
							
						
					
					
						commit
						5fe09c2d67
					
				| @ -51,7 +51,7 @@ CUTLASS 2.8 is an update to CUTLASS adding: | |||||||
| 
 | 
 | ||||||
| # Performance | # Performance | ||||||
| 
 | 
 | ||||||
| <p align="center"><img src=/media/images/cutlass-performance-plot.png></p> | <p align="center"><img src=/media/images/cutlass-2.8-gemm-performance.png></p> | ||||||
| 
 | 
 | ||||||
| CUTLASS primitives are very efficient.  When used to construct device-wide GEMM kernels, | CUTLASS primitives are very efficient.  When used to construct device-wide GEMM kernels, | ||||||
| they exhibit performance comparable to cuBLAS for scalar GEMM | they exhibit performance comparable to cuBLAS for scalar GEMM | ||||||
|  | |||||||
							
								
								
									
										
											BIN
										
									
								
								media/images/cutlass-2.8-gemm-performance.png
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										
											BIN
										
									
								
								media/images/cutlass-2.8-gemm-performance.png
									
									
									
									
									
										Normal file
									
								
							
										
											Binary file not shown.
										
									
								
							| After Width: | Height: | Size: 124 KiB | 
		Loading…
	
		Reference in New Issue
	
	Block a user
	 Andrew Kerr
						Andrew Kerr