2

I am trying to mimic __int128_t B functionality using std::array<uint64_t, 2> A (where B=A[0] + 2^64 A[1]).

How to efficiently negate A? (with two's complement)

========= Attempts =========

Negating __int128_t produces the following assembly

negq    %r12
adcq    $0, %r13
negq    %r13

I do not know how to simulate negq with the carry effect in C++. Trying this as inline assembly

asm (
    "negq %[low];"
    "adcq $0, %[high];"
    "negq %[high];"
    : [high] "+r"(A[1]), [low] "+r"(A[0])
);

produces many redundant mov's arround my code (loading & storing the registers in A[0], A[1]). Also, trying

uchar_t carry = 0;
carry = _subborrow_u64(carry, 0, A[0], &(A[0]));
carry = _subborrow_u64(carry, 0, A[1], &(A[1]));

I get

movq    16(%rsp), %rcx
movq    %rax, %rsi
subq    %rcx, %rsi
movq    %rax, %rcx
movq    %rsi, (%r8)
sbbq    24(%rsp), %rcx

being considerably slower.

What is the correct way to implement negation? (amd64)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
ohad
  • 345
  • 4
  • 7
  • The result of operator~ is the bitwise NOT (one's complement) value of the argument. Could you then try composing 2's compliment yourself by adding 1 to the LSB, and then if it overflows add 1 again to the other LSB? Not sure this would be more efficient. – Gonen I Sep 07 '22 at 12:34
  • Why not use `std::tuple`? – bitmask Sep 07 '22 at 12:41
  • @bitmask this does not matter performance-wise. Also not nice to generalize to wider ints. – ohad Sep 07 '22 at 12:55
  • @GonenI -- It is slightly faster than the attempts in my post. But still about 2x slower than int128_t. – ohad Sep 07 '22 at 12:55
  • @ohad: It was my understanding that you want arithmetic negation (i.e. `-B` rather than `~B`). That means: `-A[0]` and `~A[1]`. – bitmask Sep 07 '22 at 12:59
  • @bitmask - there may be a carry from `A[0]` to `A[1]` if `A[0]=0` – ohad Sep 07 '22 at 13:22
  • @ohad Oh, you're right. Sorry, didn't think of that. – bitmask Sep 07 '22 at 13:55
  • You can simplify your asm by removing the input operands and making the constraints for the outputs be `+r`. You can also let the compiler do the heavy lifting by passing the value through an `__int128_t` temporary, if that's acceptable for you. – Hasturkun Sep 07 '22 at 13:59
  • Sorry, part of what I meant is that it will likely reduce unneeded `mov`s around the code (and also make it more correct, since you were ignoring the extra inputs), I played around with it a bit [here](https://godbolt.org/z/dK8EP8zMd) – Hasturkun Sep 07 '22 at 14:24
  • 1
    You could search for the least-significant non-zero limb, negate that and apply 1s complement on all limbs above that. For larger inputs that will reduce data dependencies, but you could suffer under bad branch predictions in some cases. – chtz Sep 07 '22 at 15:23
  • 1
    [How do I negate a 64-bit integer stored in a 32-bit register pair?](https://stackoverflow.com/q/41080161/995714) – phuclv Sep 10 '22 at 03:07
  • Seems like `clang++` compiler does much better job than `g++` in compiling arithmetic expression such as these. Going for large negations (say, int1024) the `_subborrow_u64` approach wins. And as said, at the current state, one should only use `clang++` for similar purposes. – ohad Sep 12 '22 at 05:25

0 Answers0