From: Robert van de Geijn <
Date: Sun, 17 Feb 2013 18:23:33 -0600
Subject: Teaching how to optimize dgemm
For an undergraduate class on HPC I put together a wiki that steps students through the optimization of matrix-matrix multiplication, in
the C programming language. The student, via more than a dozen steps, goes from a simple triple-nested loop that attains less than 5% of peak to a highly optimized implementation that attains on my MacBook Air around 90% of turbo performance (on a single core). The final "microkernel", where all optimization is concentrated, is similar to the microkernel that underlies our BLIS framework for implementing BLAS-like functionality.
I believe the NADigest community may find use for this wiki, so here it is: http://z.cs.utexas.edu/wiki/rvdg.wiki/HowToOptimizeGemm
I will be giving a presentation on this wiki at the SIAM SCE meeting, in our minisymposium titled "BLAS: Evolution and Intelligent Design". http://meetings.siam.org/sess/dsp_programsess.cfm?SESSIONCODE=15732
Robert van de Geijn
UT-Austin