[update benchmark comments peter@cordes.ca**20080318181416] { hunk ./rshift.asm 31 -C +++++++++++++++++++++ shufpd version with slow (%cl) intro ++++++++++++++++++++ hunk ./rshift.asm 47 -C this SSE2 version: requires 16byte aligned input and output hunk ./rshift.asm 48 +C this SSE2 version: requires 16byte aligned input and output +C +++++++++++++++++++++ shufpd version with slow (%cl) intro ++++++++++++++++++++ hunk ./rshift.asm 82 +C size 2 6.000 c/l +C size 3 5.333 c/l hunk ./rshift.asm 86 +C size 10000000 10.820 c/l (dual-channel DDR800, g965 chipset) }