Monday, September 19, 2011

Loop unrolling

Unrolled version of multiply code gives about 1.1x speed up on C1 (Client compiler).

On Atom this code works very slow (~10 sec) for 1000 x 1000 dimension. On my laptop (Nehalem) it works fast (~3 sec) for same size.

I have to implement blocking technique for multiplying.  

No comments:

Post a Comment