I've stored a 64-bit integer in the EDX:EAX register pair.
How can I correctly negate the number?
For example: 123456789123 → -123456789123.
I've stored a 64-bit integer in the EDX:EAX register pair.
How can I correctly negate the number?
For example: 123456789123 → -123456789123.
Ask a compiler for ideas: compile int64_t neg(int64_t a) { return -a; } in 32-bit mode. Of course, different ways of asking the compiler will have the starting value in memory, in the compiler's choice of registers, or already in EDX:EAX. See all three ways on the Godbolt compiler explorer, with asm output from gcc, clang, and MSVC (aka CL).
There are of course lots of ways to accomplish this, but any possible sequence will need some kind of carry from low to high at some point, so there's no efficient way to avoid SBB or ADC.
If the value starts in memory, or you want to keep the original value in registers, xor-zero the destination and use SUB/SBB. The SysV x86-32 ABI passes args on the stack and returns 64-bit integers in EDX:EAX. This is what clang3.9.1 -m32 -O3 does, for neg_value_from_mem:
; optimal for data coming from memory: just subtract from zero
xor eax, eax
xor edx, edx
sub eax, dword ptr [esp + 4]
sbb edx, dword ptr [esp + 8]
If you have the values in registers and don't need the result in-place, you can use NEG to set a register to 0 - itself, setting CF iff the input is non-zero. i.e. the same way SUB would. Note that xor-zeroing is cheap, and not part of the latency critical path, so this is definitely better than gcc's 3-instruction sequence (below).
;; partially in-place: input in ecx:eax
xor edx, edx
neg eax ; eax = 0-eax, setting flags appropriately
sbb edx, ecx ;; result in edx:eax
Clang does this even for the in-place case, even though that costs an extra mov ecx,edx. That's optimal for latency on modern CPUs that have zero-latency mov reg,reg (Intel IvB+ and AMD Zen), but not for number of fused-domain uops (frontend throughput) or code-size.
gcc's sequence is interesting and not totally obvious. It saves an instruction vs. clang for the in-place case, but it's worse otherwise.
; gcc's in-place sequence, only good for in-place use
neg eax
adc edx, 0
neg edx
; disadvantage: higher latency for the upper half than subtract-from-zero
; advantage: result in edx:eax with no extra registers used
Unfortunately, gcc and MSVC both always use this, even when xor-zero + sub/sbb would be better.
For a more complete picture of what compilers do, have a look at their output for these functions (on godbolt)
#include <stdint.h>
int64_t neg_value_from_mem(int64_t a) {
return -a;
}
int64_t neg_value_in_regs(int64_t a) {
// The OR makes the compiler load+OR first
// but it can choose regs to set up for the negate
int64_t reg = a | 0x1111111111LL;
// clang chooses mov reg,mem / or reg,imm8 when possible,
// otherwise mov reg,imm32 / or reg,mem. Nice :)
return -reg;
}
int64_t foo();
int64_t neg_value_in_place(int64_t a) {
// foo's return value will be in edx:eax
return -foo();
}