There are many more matrix multiplications, and some that are slightly more efficient than those presented here. We now consider (last but not least) the implementation of Cannon's algorithm (1969).

Cannon's algorithm uses a mesh of processes that are connected as a torus. Process at location initially begins with submatrices and . As the algorithm progresses, the submatrices are passed left and upwards.

- Initially begins with and .
- Elements are moved from their initial positions to align them so that the correct submatrices are multiplied with one another. Note that submatrices on the diagonal don't actually require alignment. Alignment is done by shifting the -th row of positions left and the -th column of positions up.
- Each process, multiplies its current submatrices and adds to a cumulative sum.
- The submatrices are shifted left and upwards.
- The above two steps are repeated through the remaining submatrices.

aaron@csse.unimelb.edu.au Tue Oct 11 09:33:58 EST 2011