0

When adding an immediate value into a register, how does the assembler first zero out the register? For example, if I have a register value of:

%rax 0x00007fffffffe450

And I do:

mov $1, %rax

It effectively does either:

movzbq $1, %rax

Or:

mov $0x0000000000000001, %rax

As it clears out the upper bits to fit into the register. Is this done by the assembler when compiling or the cpu, or where/how does this occur?


As a sort of meta question (not directly related to the question above), I do have a bit of a hard time searching for duplicate asm questions as the titles often seem to be very specific. What would be the best way to search for these?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
David542
  • 104,438
  • 178
  • 489
  • 842
  • If we see the former, then the zeroing of the register's upper bits is handled by the CPU executing machine code as part of the definition of `movzbq` at runtime. If we see the later, then the assembler is generating a long (long) constant as an immediate which will fully fill the register without need for zeroing at runtime. – Erik Eidt Aug 28 '20 at 03:35
  • 1
    `mov $1, %rax` will be encoded as `mov $sign_extended_imm32, r/m64`. Explicit `mov` to overwriite all 64 bits. It's definitely not `movzbq`, that's an existing opcode that doesn't have an immediate form. Try it and use `objdump` to see the disassembly + machine code, and check which `mov` opcode it used. https://www.felixcloutier.com/x86/mov. If you'd written it efficiently as 5-byte `mov $1, %eax`, the upper 32 of RAX would have been overwritten by [implicit zero extension](https://stackoverflow.com/q/11177137) – Peter Cordes Aug 28 '20 at 04:26
  • 1
    [Difference between movq and movabsq in x86-64](https://stackoverflow.com/q/40315803) shows various mov encodings and talks about them in detail. Maybe not an exact duplicate, but I think that covers the available encodings of `mov` and how the machine works. Intel's manuals cover the machine-code formats of instructions if you need more than those descriptions. [How many ways to set a register to zero?](https://stackoverflow.com/a/32673696) also shows multiple forms using Intel syntax. – Peter Cordes Aug 28 '20 at 04:30
  • @PeterCordes thanks, how do you find the duplicate questions so fast? I try searching on something like "immediate" and that isn't too helpful in a search. It would be nice on StackOverflow if you could store multiple alternate titles for a question (it wouldn't need to be displayed) to help a user find questions more easily. – David542 Aug 28 '20 at 04:34
  • Questions like [Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?](https://stackoverflow.com/q/11177137) come up frequently as duplicate targets or something I want to link in a comment; my browser autocomplete happens to pop it up as one of the completions when I type "32 upper" in the address bar; sometimes I have to type more or actually hit search if I need to find it again and it's not coming up. – Peter Cordes Aug 28 '20 at 04:38
  • 1
    For the other answers I linked, I had at least a vague recollection of *writing* those answers myself, so I knew basically what to search for. I think I remembered [this](https://stackoverflow.com/a/32673696) first as one where I listed multiple mov encodings, then search results while finding that reminded me of the movq vs. movabsq answer I'd written. Also, **knowing the right answer makes it *much* easier to know exactly what to search for even in cases when I don't already remember a specific answers I want to look for.** Then I can search for some key phrases an answer should use. – Peter Cordes Aug 28 '20 at 04:42

0 Answers0