3

I was wondering, is there a way to increase a value in a xmm register or can you only move a value into one?

What I mean is, you can do this:

inc eax

or like this:

inc [ebp+7F00F000]

is there a way to do the same with a xmm?

I have tried something to resemble it, but... it doesn't work

  inc [rbx+08]
  movss xmm1,[rbx+08]

I have even tried something really stupid but it also didn't work

push edx
pextrw edx,xmm2,0
add edx,1
mov [rbx+08],edx
movss xmm1,[rbx+08]
pop edx
David Hoelzer
  • 15,862
  • 4
  • 48
  • 67
Gecko64
  • 47
  • 1
  • 5
  • Do you want to increase all the integer values in a xmm register, or only a single one? – galinette Jul 10 '16 at 17:27
  • 1
    Possible duplicate of [Add a constant value to a xmm register in x86](http://stackoverflow.com/questions/14088228/add-a-constant-value-to-a-xmm-register-in-x86) – Hans Passant Jul 10 '16 at 17:27
  • 1
    @HansPassant: That's asking about floating point. This one doesn't seem to be, since it's using integer `inc` and `pextrw`. Or else the OP is really confused. If it is supposed to be about floating point, then obviously you just add a vector that has zeros in all elements except for one. (Or use `addss` if it's the low element, since those insns merge with the old value). – Peter Cordes Jul 10 '16 at 20:51

2 Answers2

7

There's no inc equivalent for xmm regs, and there's no immediate-operand form of paddw (so there's no equivalent to add eax, 1 either).

paddw (and other element sizes) are only available with xmm/m128 source operands. So if you want to increment one element of a vector, you need to load a constant from memory, or generate it on the fly.

e.g. the cheapest way to increment all elements of xmm0 is:

; outside the loop
pcmpeqw    xmm1,xmm1     # xmm1 = all-ones = -1

; inside the loop
psubw      xmm0, xmm1    ; xmm0 -= -1   (in each element).  i.e. xmm0++

Or

paddw      xmm0, [ones]  ; where ones is a static constant.

Probably only a good idea to load the constant from memory if it takes more than maybe two instructions to construct the constant, or if register pressure is a problem.


If you want to construct a constant to increment only the low 32bit element, for example, you might use byte-shift to zero the other elements:

; hoisted out of the loop
pcmpeqw    xmm1,xmm1     # xmm1 = all-ones = -1
psrldq     xmm1, 12      # xmm1 = [ 0 0 0 -1 ]


; in the loop
psubd      xmm0, xmm1

If your attempt was supposed to increment just the low 16bit element in xmm2, then yes, it was a stupid attempt. IDK what you're doing storing into [rbx+8] and then loading into xmm1 (zeroing the high 96 bits).

Here's how to write the xmm -> gp -> xmm round trip in a less dumb way. (Still terrible compared to paddw with a vector constant).

# don't push/pop.  Instead, pick a register you can clobber without saving/restoring
movd    edx, xmm2       # this is the cheapest way to get the low 16.  It doesn't matter that we also get the element 1 as garbage in the high half of edx
inc     edx             # we only care about dx, but this is still the most efficient instruction
pinsrw  xmm2, edx, 0    # normally you'd just use movd again, but we actually want to merge with the old contents.

If you wanted to work with elements other than 16bit, you'd either use SSE4.1 pinsrb/d/q, or you'd use movd and shuffles.


See Agner Fog's Optimize Assembly guide for more good tips on how to use SSE vectors. Also other links in the tag wiki.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
1

In short, no, not in the way that you are thinking.

Under SSE, all of the original XMM registers were floating point registers. There is no increment operation for floating point.

SSE2 added a number of integer type registers, but there is still no increment. These registers and added operations were really intended for high speed arithmetic operations, including such things as dot products, accurate products with rounding, etc.

An increment operation is something that you would expect to find applied to a general register or an accumulator.

You might find this set of slides somewhat informative in terms of general overview and function.

David Hoelzer
  • 15,862
  • 4
  • 48
  • 67
  • 2
    SSE2 uses the same XMM registers, it just added instructions that operate on integer data types, including integer addition/subtraction for b/w/d/q element sizes. It's totally normal to do vector integer adds in XMM regs. You can even use them to [implement a Fibonacci sequence generator](http://stackoverflow.com/questions/32659715/assembly-language-x86-how-to-create-a-loop-to-calculate-fibonacci-sequence/32661389#32661389) if you want to. – Peter Cordes Jul 10 '16 at 20:48
  • Peter, does SSE2 on modern AMD/Intel uses same XMM registers for int and for fp when int and fp blocks are separated, and in AMD they have separate PRFs: http://hothardware.com/articleimages/Item1552/BobcatDetail1.jpg – osgx Jul 10 '16 at 23:07
  • @osgx: Didn't see your reply earlier, since you didn't @-notify me. I meant the same architectural registers. Intel and AMD CPUs have separate forwarding networks for vector-int and vector-fp. However, Intel SnB-family definitely [uses a single PRF for all vector registers.](http://www.realworldtech.com/haswell-cpu/3/). Since most code doesn't use both at once, this gives more out-of-order execution capacity for the same silicon area. I think Bobcat is actually the same: the "int" block is scalar integer (general purpose) registers. Note the IntMul unit: FP just means XMM/MMX/x87 here. – Peter Cordes Jul 19 '16 at 01:29