Why does some Windows booloader code zero registers with `sub` instead of `xor`?

Question

Given considerations such as detailed in https://stackoverflow.com/a/33668295, it seems xor reg, reg is the best way to zero a register. But when I examine real-world assembly code (such as Windows bootloader code, IIRC), I see both xor reg, reg and sub reg, reg used.

Why is sub used at all for this purpose? Are there any reasons to prefer sub in some special cases? For example, does it set flags differently from xor?

programmers preference. Look at what was it a86? The instruction set also offers more than one way to encode the machine code from asm for some of the instructions, so this assembler claimed it would leave a footprint in the machine code to be able to determine if you used that assembler. That or one like it could at the assembly level if they chose turn your xor into a sub or vice versa if they detect them. — old_timer, Jan 20 '21 at 22:29
I think `sub reg, reg` would clear the AF flag while `xor reg, reg` would leave it undefined. I'll be shocked if anybody in real life has ever needed that effect. Other than that, they should both set ZF, PF and clear SF, OF, CF. The variation may just be programmers who didn't know or didn't care about the tiny performance differences. — Nate Eldredge, Jan 20 '21 at 22:29
could be a flag thing too, bitwise operations tend to not mess with as many flags as subtract which should touch all of them (speaking generally not specific to any one instruction set) so this sub would not only zero something but will put the flags in a known state. This sub may purely be there for flags and not the register. — old_timer, Jan 20 '21 at 22:30
"Why" questions like this need to be directed to the author of the code, as we cannot read their minds, esp well after the decisions have been made and the code written. — old_timer, Jan 20 '21 at 22:32
@old_timer I'm more interested in any technical differences, however subtle. But I agree that it may have just been multiple programmers.. — MichaelK, Jan 21 '21 at 06:38
@MichaelK as a few pointed out already one is bitwise one is not, so one may have a different performance. And there is the different flags. I think both are the same number of bytes of instruction so no advantages (slim or otherwise) there. Because of the very long list of x86 implementations now, there cannot be a definitive answer. One of the many problems with "why" questions...Unless the code after it makes it clear as to why. — old_timer, Jan 21 '21 at 12:04

score 8 · Accepted Answer · answered Jan 20 '21 at 22:39

Differences:

sub reg,reg is documented to set AF=0 (the BCD half-carry flag, from bit 3 to bit 4). XOR leaves AF undefined. The architectural effect is otherwise exactly identical, leaving only possible performance differences. AF almost never matters, usually only if the next instruction is aaa or something.
sub-zeroing is slower than xor-zeroing on a few CPUs (e.g. Silvermont, as pointed out in my answer you linked), but the same performance on most. And of course both have the same 2-byte size.

I'd guess it's just different authors of hand-written asm, some of them preferring sub probably without realizing that some CPUs only special-case xor. Except in cases where they want to guarantee clearing the AF flag, where sub might be intentional. Like perhaps initializing things and wanting a fully known state for EFLAGS before something that might use pushf.

XOR leaving AF undefined still means it will be either 0 or 1, you just don't know which. (Not like C undefined behaviour). The actual result could depend on the CPU model, the input values, or possibly even some stray bits somewhere.

In modern CPUs that recognize sub as a zeroing idiom, it will be zero so the CPU can handle xor-zeroing and sub-zeroing exactly identically, including the FLAGS result.

Is there any difference in instruction alignment or something like that? — MichaelK, Jan 21 '21 at 06:35
@MichaelK No. Both instructions have the same encoding but with different opcodes. — fuz, Jan 21 '21 at 12:06
*without realizing that some CPUs only special-case xor.* - or perhaps written before Silvermont existed, at a time when there might not have been any CPUs where SUB-zeroing could be slower. — Peter Cordes, Apr 19 '21 at 21:14

fuz · Answer 2 · 2021-01-20T22:40:19.570

1

Both xor reg, reg and sub reg, reg are recognised as zeroing idioms on many modern x86 processors. The effect is the same for both and there is no advantage in using one over the other.

edited Jan 20 '21 at 22:40

answered Jan 20 '21 at 22:17

fuz

88,405
25
200
352

Why isn't always one or the other, then? Why a mix of both in the same code? – MichaelK Jan 20 '21 at 22:21
2

https://stackoverflow.com/a/33668295/634919 suggests that some machines optimize `xor` better than `sub`. On the other hand, bootloader code is not performance critical so probably nobody cared. – Nate Eldredge Jan 20 '21 at 22:25
Unlike *exclusive-or*, the operation *subtraction* is taught in elementary school, so using `XOR` for clearing a register might seem more cryptic for beginners. I prefer **SUB** when I want to fill a register with **numeric zero** and **XOR** when I want **boolean zero**. – vitsoft Jan 21 '21 at 08:11

Why does some Windows booloader code zero registers with `sub` instead of `xor`?

2 Answers2