[TAG triple jcc version
peter@cordes.ca**20080320081143] 
<
[working unrolled version.  1.55c/l
peter@cordes.ca**20080320044409
 slow when n is small and not = 4*m+1.  Use a computed jump
] 
[do the correctness-checking first
peter@cordes.ca**20080320032632] 
[add a non-SSE version copied from mpn/x86/rshift.asm. uses shrd for ~1.89 c/l on Core 2
peter@cordes.ca**20080319184937] 
[fix wrong reg state comment
peter@cordes.ca**20080319184910] 
[shift.c: print SIZE, and take ntests as second arg
peter@cordes.ca**20080319184803] 
[use movq; movhpd instead of movdqa to allow unaligned stores
peter@cordes.ca**20080319054934] 
[TAG fastest working version with unaligned loads but aligned stores
peter@cordes.ca**20080319052545] 
> {
}