2

Given packed bytes in xmm0, what is an efficient way to extract the sign (i.e. highest-order) bit of each byte into xmm1? In other words I want to compute the logical AND with 0x80 for each packed byte.

For example:

xmm0: 0xff 0xef 0x80 0x7f 0x01 ...
xmm1: 0x80 0x80 0x80 0x00 0x00 ...
jacobsa
  • 5,719
  • 1
  • 28
  • 60
  • Have you tried `_mm_and_si128()`? – Mysticial Apr 12 '16 at 04:28
  • I can imagine using `_mm_and_si128` (i.e. `pand`), but that requires me to load an appropriate mask first. (What's the most efficient way to do this?) I'm wondering if there's something specialized that can do better. – jacobsa Apr 12 '16 at 04:31

1 Answers1

3

There's no byte-element shift (psrlb or whatever), so you can't just knock off the bits you don't want with right and then left shift. Even if you only have to do this once, it might still be best to use a mask.

You can generate the mask on the fly in fewer instruction bytes than it takes to store the mask, with no possibility of a cache-miss.

pcmpeqw xmm1,xmm1     ; -1
pabsb   xmm1,xmm1     ;  1
psllw   xmm1, 7       ; set1_epi8(0x80)
pand    xmm1, xmm0

If you want the sign bits packed together in an integer reg

PMOVMSKB  reg, xmm0

But unpacking that back to a vector is slower than generating the signbit-mask (until AVX512).


If you're only doing this once, you might be able to come up with something shorter than 4 insns, esp. if you can use AVX non-destructive operations. Here's an idea that didn't end up any shorter:

vpcmpeqw    xmm1, xmm1,xmm1
vpsignb     xmm2, xmm1, xmm0     ;  xmm2 = -1 or +1 (or 0) depending on xmm0
vpsubb      xmm3, xmm2, xmm1     ;  xmm3 = 0 or +2  (or +1) depending on xmm0.  (subtract -1 => add 1)
vpsllw      xmm4, xmm3, 6        ;  xmm4 = 0 or 0x80 (or 0x40) depending on xmm0

Nope, wasn't any shorter. Depending on what you need, part of this idea might help.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847