[TAG triple jcc version peter@cordes.ca**20080320081143] < [working unrolled version. 1.55c/l peter@cordes.ca**20080320044409 slow when n is small and not = 4*m+1. Use a computed jump ] [do the correctness-checking first peter@cordes.ca**20080320032632] [add a non-SSE version copied from mpn/x86/rshift.asm. uses shrd for ~1.89 c/l on Core 2 peter@cordes.ca**20080319184937] [fix wrong reg state comment peter@cordes.ca**20080319184910] [shift.c: print SIZE, and take ntests as second arg peter@cordes.ca**20080319184803] [use movq; movhpd instead of movdqa to allow unaligned stores peter@cordes.ca**20080319054934] [TAG fastest working version with unaligned loads but aligned stores peter@cordes.ca**20080319052545] > { }