From: Robert van de Geijn <This email address is being protected from spambots. You need JavaScript enabled to view it.>

Date: Sun, 17 Feb 2013 18:23:33 -0600

Subject: Teaching how to optimize dgemm


For an undergraduate class on HPC I put together a wiki that steps students through the optimization of matrix-matrix multiplication, in

the C programming language.  The student, via more than a dozen steps, goes from a simple triple-nested loop that attains less than 5% of peak to a highly optimized implementation that attains on my MacBook Air around 90% of turbo performance (on a single core).  The final "microkernel", where all optimization is concentrated, is similar to the microkernel that underlies our BLIS framework for implementing BLAS-like functionality.


I believe the NADigest community may find use for this wiki, so here it is:


I will be giving a presentation on this wiki at the SIAM SCE meeting, in our minisymposium titled "BLAS: Evolution and Intelligent Design".


Robert van de Geijn