6

Under 64 bit x86 CPU normally we load number -1 in to register like:

mov     rdx, -1  //  48BAFFFFFFFFFFFFFFFF

... this instruction takes 10 bytes the way old versions of NASM assemble it.


Another way is:

xor     rdx, rdx //  4831D2        
dec     rdx      //  48FFCA  

       

... this opcode takes only 6 bytes.

EDIT:

As Jens Björnhager say (I have tested) xor edx, edx opcode should clear whole rdx register:

xor     edx, edx //  31D2        
dec     rdx      //  48FFCA 

... this opcode takes only 5 bytes.

EDIT:

Alexey Frunze found another solution:

mov     rdx, -1  // 48C7C2FFFFFFFF

... this instruction takes only 7 bytes. But how to tell assembler to use shorter encoding (without using DB)? You can hint NASM to use this encoding, in case you're using an old version which doesn't default to enabling optimization (of code size), and you don't use nasm -Ox manually.

mov     rdx, dword -1

What is faster and what is more economical?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
GJ.
  • 10,810
  • 2
  • 45
  • 62

3 Answers3

7

There's a shorter one than all of ones mentioned: 4883CAFF OR rdx,-1
It has the nasty property of having a false dependency on all architectures I know of, but it shouldn't go unmentioned IMO. There are legitimate reasons to use it. For example if the result is not needed until quite a lot later, and it's in a loop which would otherwise not fit in four 16byte blocks. Also, if speed is of no big concern for a particular piece of code, one might as well not waste precious cache space. It could also be used for alignment reasons, but it would almost certainly be faster to pad to the next higher alignment instead.

As for telling the compiler this, I haven't got a clue.

harold
  • 61,398
  • 6
  • 86
  • 164
  • [some compilers can emit `or rdx, -1`, mainly in `-Os`](https://reverseengineering.stackexchange.com/a/4622/2563) – phuclv Mar 07 '18 at 13:39
  • Near duplicate, [Set all bits in CPU register to 1 efficiently](https://stackoverflow.com/q/45105164) concurs with `mov rdx, -1` (7-byte encoding) for performance, or `or rdx, -1` for code-size. Or `lea rdx, [rsi-1]` if you happen to need a zeroed register for something else. But not an exact duplicate because this question is about ancient NASM which doesn't use the `mov r/m64, sign_extended_imm32` encoding on its own. – Peter Cordes Dec 24 '22 at 02:32
5

The first is much better. The first has no dependencies at all. The second has one of the worst kinds of dependencies -- an instruction requires the final result of the instruction immediately prior to it before it can begin. However, if you had some other instructions that you could slip between the xor and the dec, that would eliminate the dependency and then the second option might win out.

The second one also has false dependency on the value of rdx, which the first one does not. Some CPUs might be smart enough to recognize this false dependency and not stall the first instruction until the value of rdx is known (since the output is zero regardless). Some x86 CPUs do have logic to ignore some false dependencies.

Comparing the number of code bytes is not very useful. It's very unlikely under most realistic conditions that the number of bytes the code occupies will be very significant.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • 2
    [`xor edx,edx` is dep-breaking on every out-of-order x86 CPU newer than PII / PIII](https://stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and), so it can run as soon as it issues. On Intel SnB-family, zeroing idioms don't even need an execution port (handled at rename time), so the zeroing `xor`and the `dec` could issue into the out-of-order core in the same cycle, and the `dec` would still be able to execute in the next cycle, as if `rdx` hadn't been modified. Anyway, **the downside is 2 uops for the front-end instead of 1**. – Peter Cordes Aug 11 '17 at 07:34
3

There's an alternative, 7-byte, encoding of mov rdx, -1: 48C7C2FFFFFFFF.

You can try writing the instruction as mov rdx, dword -1 in the code to aid the compiler/assembler in using this shorter encoding.

Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • 1
    Modern versions of NASM default to `-Ox`, full optimization to find the smallest encoding of an instruction that's architecturally equivalent. That makes NASM behave like any decent assembler should, and choose the `mov r/m64, sign_extended_imm32` encoding for `mov reg, -1`. – Peter Cordes Jan 18 '22 at 04:18