## Saturday, September 24, 2011

### Blocking techique

I've tried to implement blocking algorithm for matrices multiplication in la4j and found interesting things.

There is code of both algorithm:

(Simple one)

```1: public Matrix multiply(Matrix matrix) throws MatrixException {
2:
3:   int thisRows = this.rows;
4:   int thisColumns = this.columns;
5:   int thatRows = matrix.rows();
6:   int thatColumns = matrix.columns();
7:
8:   if (thisColumns != thatRows) {
9:    throw new MatrixException(
10:      "can not multiply this matrix: wrong dimmentions");
11:   }
12:
13:   DenseMatrix result = new DenseMatrix(thisRows, thatColumns);
14:   double resultArr[][] = result.toArray();
15:
16:   double thatColumn[] = new double[thatRows];
17:
18:   for (int j = 0; j < thatColumns; j++) {
19:    for (int k = 0; k < thisColumns; k++) {
20:     thatColumn[k] = matrix.get(k, j);
21:    }
22:
23:    for (int i = 0; i < thisRows; i++) {
24:     double thisRow[] = self[i];
25:     double summand = 0;
26:     for (int k = 0; k < thisColumns; k++) {
27:      summand += thisRow[k] * thatColumn[k];
28:     }
29:     resultArr[i][j] = summand;
30:    }
31:   }
32:
33:   return result;
34:  }```
```
```
(Blocking one)

```1: public Matrix multiplyBlocks(Matrix matrix) throws MatrixException {
2:   int thisRows = this.rows;
3:   int thisColumns = this.columns;
4:   int thatRows = matrix.rows();
5:   int thatColumns = matrix.columns();
6:
7:   if (thisColumns != thatRows) {
8:    throw new MatrixException(
9:      "can not multiply this matrix: wrong dimmentions");
10:   }
11:
12:   DenseMatrix result = new DenseMatrix(thisRows, thatColumns);
13:
14:   double resultArr[][] = result.toArray();
15:
16:   int block_size = 64;
17:
18:   for (int i = 0; i<thisRows; i += block_size) {
19:    for (int k = 0; k<thisColumns; k += block_size) {
20:     for (int j = 0; j<thatColumns; j += block_size) {
21:      for(int u = 0; u<block_size; ++u) {
22:       for(int w = 0; w<block_size; ++w) {
23:        for(int v = 0; v<block_size; ++v) {
24:         resultArr[i+u][j+v] += self[i+u][k+w] * matrix.get(k+w, j+v);
25:        }
26:       }
27:      }
28:     }
29:    }
30:   }
31:
32:   return result;
33:  }```

There is results for (1024x1024 matrices):

• Server compiler:
• Simple: 2,75 s
• Blocking: 2,4 s
• Client compiler:
• Simple: 3,91 s
• Blocking:  18,43 s

It's very strange! And now I have just one question: What i should do? May be I need to implement something like profile for la4j, which will chose the better algorithm for any VM. Or I need to forgot about 10-15% speed up and continue use simple algorithm.