ARM64: LDRB fills (zeroes) the whole 64-bit register?

Question

I'm new to ARM64 assembly (coming from x86 world). Suppose the following instructions:

adr     x1,_data
ldrb    w0, [x1]

I was expecting that the register "w0" (32-bit wide) is filled with zero and just the lowest byte is filled with the byte read from memory. I can see that the high 32-bit from X0 is also filled with ZERO! I was not expecting that! :)

So, what's the point of ldrb w0, [x1]? Isn't the same as ldrb x0, [x1]?

Thanks!

Yes, it's the same for zero-extension, but not for sign-extension. Writing 32-bit `w0` with `ldrsb w0, [x1]` still zero-extends into `x0`. Redundancy in possible machine-code encodings is often a matter of keeping the hardware simpler, not trying to use the encoding that would mean `ldrb x0, [mem]` for something different. — Peter Cordes, Mar 14 '23 at 20:54
Thanks Peter! Yes, you are right, it's different for sign-extension, where it only affects the 32-bit w0. Great! — raff, Mar 14 '23 at 20:58
@NateEldredge: Oh, apparently I guessed wrong based on the question's faulty premise, and there isn't a redundant opcode for an explicitly 64-bit zero-extending load. That also makes sense if one instruction bit controlling size isn't too baked-in to typical designs. I guess opcodes often come in 32 vs. 64-bit pairs but don't have to. — Peter Cordes, Mar 14 '23 at 21:00
In general, *every* ARM64 instruction that writes to a 32-bit register `wN` also zeros the high half of the corresponding 64-bit register `xN`. This avoids having input dependencies on the destination register, or needing to do partial register renaming. x86 for legacy reasons doesn't do that in all cases, so that `mov ax, [data]` leaves the upper 48 bits of `rax` alone, and the result is a lot of extra design complexity and inefficiency. ARM didn't make that mistake. — Nate Eldredge, Mar 14 '23 at 21:01
@PeterCordes: Yes, that's right. From a glance at the encoding, there's a two-bit field in the opcode: 00 = store, 01 = load zero-extend, 10 = load sign-extend to 64 bits, 11 = load sign-extend to 32 bits. — Nate Eldredge, Mar 14 '23 at 21:05
@PeterCordes: For ALU instructions, bit 31 generally selects 32 vs 64 bit operation, but for load-store, bits 30 and 31 together specify the size of the load/store operation (1 / 2 / 4 / 8 bytes). — Nate Eldredge, Mar 14 '23 at 21:08
@NateEldredge: x86-64 also got it right for the new behaviour it introduced, implicit zero-extension to 64-bit when writing a 32-bit register. (Hence [MOVZX missing 32 bit register to 64 bit register](https://stackoverflow.com/q/51387571)). Same strategy AArch64 adopted. ARM never allowed writing parts of a register narrower than 32-bit, but x86 did. (And x86-64 chose not to change partial-register behaviour of existing instructions and operand-sizes, only how 32-bit ops interact with 64-bit regs.) — Peter Cordes, Mar 14 '23 at 21:09
So I guess I'm saying x86-64 didn't make anything worse and avoided any new problem with the same strategy that AArch64 later used, and AArch64 didn't have any "problems" to fix because ARM didn't have partial registers. But yes, x86-64 missed several opportunities to remove some x86 warts, probably because AMD weren't sure it would even catch on and wanted to keep the transistor differences minimal in CPUs that had to be optimized for running legacy 32-bit code. — Peter Cordes, Mar 14 '23 at 21:09
@NateEldredge, thanks a lot for commenting and sharing your knowledge. Yes, you are right, I wrongly assumed that "ldrb x0, [x1]" existed. It's quite clear your explanations! Thanks! — raff, Mar 14 '23 at 21:13

score 2 · Accepted Answer · answered Mar 14 '23 at 21:18

In general, every ARM64 instruction that writes to a 32-bit register wN also zeros the high half of the corresponding 64-bit register xN. This avoids having input dependencies on the destination register, or needing to do partial register renaming.

In the case of ldrb w0, [x1] this means you get zero extension into x0 for free. If the byte loaded from memory was to be treated as an unsigned 8-bit integer, then the relevant pieces of x0 automatically contain the correct values to treat it as an unsigned 16, 32 or 64 bit integer. As such, there is no need for a separate ldrb x0, [x1] mnemonic, and in fact the assembler will reject it.

If you want sign extension instead, then there is a difference. If the byte in memory is 0xa5, then ldrsb w0, [x1] will populate w0 with 0xffffffa5 and zero the high half, so that x0 contains 0x00000000ffffffa5. On the other hand, ldrsb x0, [x1] would extend into the entire 64-bit register, leaving x0 containing 0xffffffffffffffa5.

ARM64: LDRB fills (zeroes) the whole 64-bit register?

1 Answers1