[update benchmark comments peter@cordes.ca**20080315060309] { hunk ./rshift.asm 75 -C AMD K8, movhlps version -C size 496 3.048 cycles/limb +C AMD K8 (Opteron 2218) +C size 1 15.938 cycles +C size 496 3.044 cycles/limb hunk ./rshift.asm 79 +C Core 2 Conroe (2.4GHz) +C size 1 12.224 cycles (11.968 with pxor after the loop) +C size 4 6.420 cycles (4.0 with movdqa ; shufpd version uncommented instead.) +C size 496 2.036-2.068 cycles/limb + +C Core 2 Harpertown (2.8GHz) +C size 1 12.049 cycles +C size 496 2.062 cycles/limb }