2

I've stored a 64-bit integer in the EDX:EAX register pair. How can I correctly negate the number?

For example: 123456789123-123456789123.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Róbert Nagy
  • 6,720
  • 26
  • 45
  • 5
    You forgot to show what you tried or considered. There are many ways. First, `-x=0-x` so you can subtract from 0. Then you could also do `-x=-1*x`. Doing the 2's complement formula of flipping all the bits and adding one is also an option. – Jester Dec 10 '16 at 20:56

1 Answers1

11

Ask a compiler for ideas: compile int64_t neg(int64_t a) { return -a; } in 32-bit mode. Of course, different ways of asking the compiler will have the starting value in memory, in the compiler's choice of registers, or already in EDX:EAX. See all three ways on the Godbolt compiler explorer, with asm output from gcc, clang, and MSVC (aka CL).

There are of course lots of ways to accomplish this, but any possible sequence will need some kind of carry from low to high at some point, so there's no efficient way to avoid SBB or ADC.


If the value starts in memory, or you want to keep the original value in registers, xor-zero the destination and use SUB/SBB. The SysV x86-32 ABI passes args on the stack and returns 64-bit integers in EDX:EAX. This is what clang3.9.1 -m32 -O3 does, for neg_value_from_mem:

    ; optimal for data coming from memory: just subtract from zero
    xor     eax, eax
    xor     edx, edx
    sub     eax, dword ptr [esp + 4]
    sbb     edx, dword ptr [esp + 8]

If you have the values in registers and don't need the result in-place, you can use NEG to set a register to 0 - itself, setting CF iff the input is non-zero. i.e. the same way SUB would. Note that xor-zeroing is cheap, and not part of the latency critical path, so this is definitely better than gcc's 3-instruction sequence (below).

    ;; partially in-place: input in ecx:eax
    xor     edx, edx
    neg     eax         ; eax = 0-eax, setting flags appropriately
    sbb     edx, ecx    ;; result in edx:eax

Clang does this even for the in-place case, even though that costs an extra mov ecx,edx. That's optimal for latency on modern CPUs that have zero-latency mov reg,reg (Intel IvB+ and AMD Zen), but not for number of fused-domain uops (frontend throughput) or code-size.


gcc's sequence is interesting and not totally obvious. It saves an instruction vs. clang for the in-place case, but it's worse otherwise.

    ; gcc's in-place sequence, only good for in-place use
    neg     eax
    adc     edx, 0
    neg     edx
       ; disadvantage: higher latency for the upper half than subtract-from-zero
       ; advantage: result in edx:eax with no extra registers used

Unfortunately, gcc and MSVC both always use this, even when xor-zero + sub/sbb would be better.


For a more complete picture of what compilers do, have a look at their output for these functions (on godbolt)

#include <stdint.h>

int64_t neg_value_from_mem(int64_t a) {
     return -a;
}

int64_t neg_value_in_regs(int64_t a) {
    // The OR makes the compiler load+OR first
    // but it can choose regs to set up for the negate
    int64_t reg = a | 0x1111111111LL;
    // clang chooses mov reg,mem   / or reg,imm8 when possible,
    // otherwise     mov reg,imm32 / or reg,mem.  Nice :)
    return -reg;
}

int64_t foo();
int64_t neg_value_in_place(int64_t a) {
    // foo's return value will be in edx:eax
    return -foo();
}
Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 3
    Interestingly only gcc does that, clang and icc use the subtract from zero. – Jester Dec 10 '16 at 21:00
  • Why do we need `adc edx, 0` ? The `neg` operation only sets the carry flag if the *operator* is *0* – Róbert Nagy Dec 10 '16 at 21:21
  • @NagyRobi: You have NEG's flag-setting backwards. It's doing `edx= -(edx+CF)` instead of `edx = 0 - edx - CF`, which is why it uses ADC instead of SBB. – Peter Cordes Dec 10 '16 at 21:23
  • For what it's worth, MSVC uses the same `NEG`+`ADC`+`NEG` sequence that GCC does with optimizations enabled (whether speed or size). *Interestingly*, it uses something akin to `XOR`+`NEG`+`SBB` with optimizations *disabled*. It is not actually smart about using `NEG`, though, choosing instead to subtract from 0. The optimized code almost certainly comes from a hard-coded sequence, and someone has yet to notice that there's a faster way of doing it. – Cody Gray - on strike Dec 11 '16 at 16:36
  • 1
    Peter first sequence has 3 instructions with serial dependencies, so it should take 3 clocks. His second sequence should be accomplished in two clocks, so it is faster. I'll note it is smaller, too. What's not to like? – Ira Baxter Dec 11 '16 at 18:30
  • The isolated sequence is faster, but it uses one more register. I would say that saving registers is a good idea in general. In longer code there could be other data dependencies related to the additional register. – Juan Dec 12 '16 at 05:57
  • @Juan: [xor-zeroing](http://stackoverflow.com/a/33668295/224132) breaks any dependency on the old value of ECX. Most write-only instructions (like you'd use on a previously-dead register) don't have false dependencies, POPCNT/LZCNT/TZCNT/BSF/BSR on Intel being the exception. The only cost is maybe spilling something. If the 64-bit value to be negated starts in memory (like for that function with `-m32`), clang just zeroes EAX and EDX, and uses SUB / SBB with memory operands. (Try it on the Godbolt link) – Peter Cordes Dec 12 '16 at 19:14
  • Yes, agreed. What I mean is just that for every negation you're using 3 32-bit registers instead of 2, and there are never enough registers. Maybe it's worth it, but I think it's good to have it into account – Juan Dec 12 '16 at 19:19
  • @Juan: but it doesn't use up 3 registers for the rest of the function. Once you're done, you have your result in 2 registers and a free scratch register. Anyway sure, it's one minor difference that might make the slower version better overall in a situation with no scratch regs free, but I expect that's pretty rare. – Peter Cordes Dec 12 '16 at 19:26
  • Just for the record, the already-in-registers partially in-place code-gen was from `gcc -O3 -m32 -mregparm=3` which changes the calling convention to pass args in regs. I don't seem to have mentioned that in the answer. – Peter Cordes Apr 06 '23 at 02:00