From: Robert van de Geijn <This email address is being protected from spambots. You need JavaScript enabled to view it.>

Date: Sun, 17 Feb 2013 18:23:33 -0600

Subject: Teaching how to optimize dgemm

 

For an undergraduate class on HPC I put together a wiki that steps students through the optimization of matrix-matrix multiplication, in

the C programming language.  The student, via more than a dozen steps, goes from a simple triple-nested loop that attains less than 5% of peak to a highly optimized implementation that attains on my MacBook Air around 90% of turbo performance (on a single core).  The final "microkernel", where all optimization is concentrated, is similar to the microkernel that underlies our BLIS framework for implementing BLAS-like functionality.

 

I believe the NADigest community may find use for this wiki, so here it is: http://z.cs.utexas.edu/wiki/rvdg.wiki/HowToOptimizeGemm

 

I will be giving a presentation on this wiki at the SIAM SCE meeting, in our minisymposium titled "BLAS: Evolution and Intelligent Design". http://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=15732

 

Robert van de Geijn

UT-Austin

用户登录