2

I have this code which says

global main
[BITS 64]

section .text
main:
     mov r13, 0x1234

     mov rax, 60
     mov rdi, 0
     syscall

When I translate manually this instruction mov r13, 0x1234, I've as hexadecimal code 0x48_BD_34_12_00_00.

The op code of the instruction is REX.W + B8+ rd io (I guess).

When I translate my file on Linux, the hexadecimal traduction is 0x41_BD_34_12_00_00.

41 is 0100_0001 b. But the REX.W says that W = 1, so it should be 0100_1001b.

So I don't understand why the REX prefix is 41h and not 49h.

vitsoft
  • 5,515
  • 1
  • 18
  • 31
Heyy
  • 43
  • 3
  • 1
    Assemblers use the side effect of zeroing upper half of 64bit GPR when the lower half is being written to. This allows to encode `MOV r64, imm32` as `MOV r32, imm32` when `imm32` is nonnegative, and thus spare the REX.W prefix byte. You are right that `MOV R13, 0x1234` could be encoded with prefix REX.WB instead of prefix REX.B, because it does not spare encoding size (REX prefix is necessary anyway due to using target register R8..R15). Your assembler seems to omit the REX.W flag by inertia, but don't worry, it works as well. – vitsoft May 22 '22 at 15:45

1 Answers1

2

There are two reasons for this.

First, the instruction NASM encodes is actually mov r13d, 0x1234 instead of mov r13, 0x1234. This is because the former instruction is shorter but does the same thing.

Now why do we see this encoding? Here's an explanation:

41 bd 34 12 00 00
|| ||  ||||||||||
|| ||  ``````````-- immediate value
|| ``-------------- opcode b8 + reg (5)
``----------------- REX.B prefix

The register we want to encode has number 13. The low 3 bit of this register number are encoded in the opcode byte. The high bit is encoded in the REX.B bit. Hence, a REX.B prefix is needed.

If we wanted to encode mov r13, 0x1234 as nasm -O0 would, like mov r13, strict qword 0x1234 , it would look like this:

49 bd 34 12 00 00 00 00 00 00

Here we have a REX.BW prefix 49 to encode both the additional register bit and the 64 bit operand width. This is the mov r64, imm64 encoding, same opcode as mov r32, imm32 but with a REX.W.

Assemblers that don't optimize to a 32-bit register but do pick the shortest encoding for what you wrote (e.g. YASM or GAS) would use the mov r/m64, sign_extended_imm32 encoding, which you can get from NASM with mov r13, strict dword 0x1234. The C7 and C5 bytes are opcode and Mod/RM, followed by a 4-byte immediate.

49 c7 c5 34 12 00 00
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
fuz
  • 88,405
  • 25
  • 200
  • 352