There are many more matrix multiplications, and some that are slightly more efficient than those presented here. We now consider (last but not least) the implementation of Cannon's algorithm (1969).
Cannon's algorithm uses a mesh of processes that are connected as a torus. Process at location initially begins with submatrices
. As the algorithm progresses, the submatrices are passed left and upwards.
- Initially begins with
- Elements are moved from their initial positions to align them so that the correct submatrices are multiplied with one another. Note that submatrices on the diagonal don't actually require alignment. Alignment is done by shifting the -th row of
positions left and the -th column of
- Each process, multiplies its current submatrices and adds to a cumulative sum.
- The submatrices are shifted left and upwards.
- The above two steps are repeated through the remaining submatrices.
firstname.lastname@example.org Tue Oct 11 09:33:58 EST 2011