42

I'm curious how many ways are there to set a register to zero in x86 assembly. Using one instruction. Someone told me that he managed to find at least 10 ways to do it.

The ones I can think of are:

xor ax,ax
mov ax, 0
and ax, 0
  • 3
    would really like to know why some of you are voting to close this question. thanks. –  Jan 28 '11 at 15:38
  • 3
    sub ax, ax :) shr ax, 16; mul ax,0 – bestsss Jan 28 '11 at 15:38
  • 1
    There are no doubt many ways to do it but, unfortunately, I have to vote to close as too localised since the usefulness of such a question seems way too narrow: `This question would only be relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet`. – paxdiablo Jan 28 '11 at 15:39
  • 1
    @bestsss shr ax,16 won't work you are only allowed to shift one without using the cl,cx register so that would be mov cx, 16 shr ax,cx. I forgot about sub, nice :P –  Jan 28 '11 at 15:40
  • @nvm I was think exactly about that – bestsss Jan 28 '11 at 15:42
  • 2
    @paxdiablo can't find anything like that in the faq. In that case everyone should be a python/java programmer. –  Jan 28 '11 at 15:43
  • @paxdiablo if there is 8086 tag available the question should be ok, imo. "applicable to the worldwide audience of the internet" is just pretentious. I'd consider asking if 'xxx can be declared static or accessed like' just as bad and stupid, but someone (who didnt read a single book or spec) was keen of, yet definitely not the mass amount of people interested. – bestsss Jan 28 '11 at 15:47
  • @nvm, it's one of the close reasons, `too localised`, that text I gave was the explanatory text for it. One of the types of questions I try to "weed" (hope you don't take that comment too personally) are those that are of dubious usefulness. In my opinion, this is one of those. Of course, I'm only one cell in the SO swarm so can easily be outvoted by others. Even if this question were to close (no guarantee of that), it'll probably get re-opened eventually. – paxdiablo Jan 28 '11 at 15:47
  • @bestsss, I tend to concentrate more on the `or an extraordinarily narrow situation` when judging questions. As I said, I myself can't see the usefulness. In addition, the phrase "I'm curious ..." seems to belie the FAQ desire for "You should only ask practical, answerable questions based on actual problems that you face". If the question had been more along the lines of what's the _fastest_ way, I would have had no hesitation in leaving it alone. – paxdiablo Jan 28 '11 at 15:49
  • 1
    @paxdiablo fastest is well known 'xor' but the interesting thing behind is: why the rest are actually slower (or affect flags), to me it's quite practical since it reflects the CPU design and gives insights about other cases – bestsss Jan 28 '11 at 15:52
  • But I don't normally explain myself in such detail :-) However, since nvm asked, I thought it polite to explain why. – paxdiablo Jan 28 '11 at 15:53
  • there are at lease 4giga number of answers, perhaps that is why folks are voting to close mov ax,1, dec ax...mov ax,2; dec ax; dec ax – old_timer Jan 28 '11 at 16:35
  • 1
    @dwelch one instruction ONLY. –  Jan 28 '11 at 17:07
  • I think you need to re-ask the question, why is xor eax,eax faster than mov ax,0. and the answer is look at the fetches required, the xor can be a single byte instruction the others are something like 5 bytes. – old_timer Jan 28 '11 at 18:10
  • By the way, if you remove the "single instruction" constraint, zeroing a register actually requires to solve the halting problem. Let's say we have a loop that, upon some condition, zeroes a single bit of `ax`, and loops until the 16 bits of `ax` are actually 0. In this case, to know whether `ax` will ever take the value of 0 implies knowing whether the loop eventually stops. – ljleb May 17 '20 at 05:10
  • I was procrastinating recently and thought I would look at the corresponding question for ARM64. I got up to 57530 unique instructions equivalent to `mov x0, #0`. The zero register is fun! The bulk are of the form `and x0, xzr, #imm` with any immediate, or `and x0, xzr, xN, shl #count` with any source register and shift count/mode, but they span 33 different opcodes. One interesting one is `sdiv x0, xN, xzr` since division by zero doesn't trap, but silently yields zero as the result. – Nate Eldredge Apr 24 '23 at 04:59

10 Answers10

19

There are a lot of possibility how to mov 0 in to ax under IA32...

    lea eax, [0]
    mov eax, 0FFFF0000h         //All constants form 0..0FFFFh << 16
    shr  ax, 16                 //All constants form 16..31
    shl eax, 16                 //All constants form 16..31

And perhaps the most strange... :)

@movzx:
    movzx eax, byte ptr[@movzx + 6]   //Because the last byte of this instruction is 0

and also in 32-bit mode (longer instruction puts the final (most-significant) address byte later)...

  @movzx:
    movzx ax, byte ptr[@movzx + 7]

Edit:

And for 16 bit x86 cpu mode, not tested...:

    lea  ax, [0]

and...

  @movzx:
    movzx ax, byte ptr cs:[@movzx + 7]   //Check if 7 is right offset

The cs: prefix is optional in case that the ds segment register is not equal to cs segment register.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
GJ.
  • 10,810
  • 2
  • 45
  • 62
  • @GJ. The `shr eax, 16` instruction is not guaranteed to clear the `AX` register! `shr ax, 16` would be fine. Your `movzx ax, byte ptr cs:[@movzx + 7]` instruction for use in the 16-bit real address mode uses the correct offset **7**, but I think you should make clear that for this to work an Address Size Prefix (67h) is required, so that the high word of the 32-bit displacement is zero. – Sep Roland Aug 24 '21 at 18:51
  • `eax >>= 16` leaves the low 16-bits (AX) non-zero when the original high bits were non-zero. >.< It leaves the *high* 16 bits of EAX=0, but that's not AX. – Peter Cordes Aug 24 '21 at 22:57
  • 1
    Your edit to movzx into eax instead of ax with `movzx eax, byte ptr[@movzx + 7]` broke it (unless you mean in 16-bit mode with a 67h prefix, which that doesn't imply and you didn't mention in comments). That instruction is only 7 bytes long, that's why the same instruction in the previous code block uses `$ + 6` instead of `$ + 7` to access the last byte of the disp32. – Peter Cordes Aug 24 '21 at 23:00
  • Should note that `movzx` is a 386+ instruction, so it won't work on all x86-16 machines. – ecm Aug 27 '23 at 17:21
13

See this answer for the best way to zero registers: xor eax,eax (performance advantages, and smaller encoding).


I'll consider just the ways that a single instruction can zero a register. There are far too many ways if you allow loading a zero from memory, so we'll mostly exclude instructions that load from memory.

I've found 10 different single instructions that zero a 32bit register (and thus the full 64bit register in long mode), with no pre-conditions or loads from any other memory. This is not counting different encodings of the same insn, or the different forms of mov. If you count loading from memory that's known to hold a zero, or from segment registers or whatever, there are a boatload of ways. There are also a zillion ways to zero vector registers.

For most of these, the eax and rax versions are separate encodings for the same functionality, both zeroing the full 64-bit registers, either zeroing the upper half implicitly or explicitly writing the full register with a REX.W prefix.

Integer registers (NASM syntax):

# Works on any reg unless noted, usually of any size.  eax/ax/al as placeholders
and    eax, 0         ; three encodings: imm8, imm32, and eax-only imm32
andn   eax, eax,eax   ; BMI1 instruction set: dest = ~s1 & s2
imul   eax, any,0     ; eax = something * 0.  two encodings: imm8, imm32
lea    eax, [0]       ; absolute encoding (disp32 with no base or index).  Use [abs 0] in NASM if you used DEFAULT REL
lea    eax, [rel 0]   ; YASM supports this, but NASM doesn't: use a RIP-relative encoding to address a specific absolute address, making position-dependent code

mov    eax, 0         ; 5 bytes to encode (B8 imm32)
mov    rax, strict dword 0   ; 7 bytes: REX mov r/m64, sign-extended-imm32.    NASM optimizes mov rax,0 to the 5B version, but dword or strict dword stops it for some reason
mov    rax, strict qword 0   ; 10 bytes to encode (REX B8 imm64).  movabs mnemonic for AT&T.  normally assemblers choose smaller encodings if the operand fits, but strict qword forces the imm64.

sub    eax, eax       ; recognized as a zeroing idiom on some but maybe not all CPUs
xor    eax, eax       ; Preferred idiom: recognized on all CPUs
                      ; 2 same-size encodings each: r/m, r  vs.  r, r/m

@movzx:
  movzx eax, byte ptr[@movzx + 6]   //Assuming the high byte of the absolute address is 0.  Not position-independent, and x86-64 RIP+rel32 would load 0xFF

.l: loop .l             ; clears e/rcx... eventually.  from I. J. Kennedy's answer.  To operate on only ECX, use an address-size prefix.
; rep lodsb             ; not counted because it's not safe (potential segfaults), but also zeros ecx

Instructions like xor reg,reg can be encoded two different ways. In GAS AT&T syntax, we can request which opcode the assembler chooses. This only applies to reg,reg integer instructions that allow both forms, i.e. that date back to 8086. So not SSE/AVX.

  {load}  xor %eax, %eax           # 31 c0
  {store} xor %eax, %eax           # 33 c0

"Shift all the bits out one end" isn't possible for regular-size GP registers, only partial registers. shl and shr shift counts are masked (on 286 and later): count & 31; i.e. mod 32.

(Immediate-count shifts were new in 186 (previously only CL and implicit-1), so there are CPUs with unmasked immediate shifts (also including NEC V30). Also, 286 and earlier are 16bit-only, so ax is a "full" register. There were CPUs where a shift can zero a full integer register.)

Also note that shift counts for vectors saturate instead of wrapping.

# Zeroing methods that only work on 16bit or 8bit regs:
shl    ax, 16           ; shift count is still masked to 0x1F for any operand size less than 64b.  i.e. count %= 32
shr    al, 16           ; so 8b and 16b shifts can zero registers.

# zeroing ah/bh/ch/dh:  Low byte of the reg = whatever garbage was in the high16 reg
movxz  eax, ah          ; From Jerry Coffin's answer

Depending on other existing conditions (other than having a zero in another reg):

bextr  eax,  any, eax  ; if al >= 32, or ah = 0.  BMI1
BLSR   eax,  src       ; if src only has one set bit
CDQ                    ; edx = sign-extend(eax)
sbb    eax, eax        ; if CF=0.  (Only recognized on AMD CPUs as dependent only on flags (not eax))
setcc  al              ; with a condition that will produce a zero based on known state of flags

PSHUFB   xmm0, all-ones  ; xmm0 bytes are cleared when the mask bytes have their high bit set

Vector regs

Some of these SSE2 integer instructions can also be used on MMX registers (mm0 - mm7). I'm not going to show that separately.

Again, best choice is some form of xor. Either PXOR / VPXOR, or XORPS / VXORPS. See What is the best way to set a register to zero in x86 assembly: xor, mov or and? for details.

AVX vxorps xmm0,xmm0,xmm0 zeros the full ymm0/zmm0, and is better than vxorps ymm0,ymm0,ymm0 on AMD CPUs.

These zeroing instructions have three encodings each: legacy SSE, AVX (VEX prefix), and AVX512 (EVEX prefix), although the SSE version only zeros the bottom 128, which isn't the full register on CPUs that support AVX or AVX512. Anyway, depending on how you count, each entry can be three different instructions (same opcode, though, just different prefixes). Except vzeroall, which AVX512 didn't change (and doesn't zero zmm16-31).

PXOR       xmm0, xmm0     ;; recommended
XORPS      xmm0, xmm0     ;; or this
XORPD      xmm0, xmm0     ;; longer encoding for zero benefit
PXOR       mm0, mm0     ;; MMX, not show for the rest of the integer insns

ANDNPD    xmm0, xmm0
ANDNPS    xmm0, xmm0
PANDN     xmm0, xmm0     ; dest = ~dest & src

PCMPGTB   xmm0, xmm0     ; n > n is always false.
PCMPGTW   xmm0, xmm0     ; similarly, pcmpeqd is a good way to do _mm_set1_epi32(-1)
PCMPGTD   xmm0, xmm0
PCMPGTQ   xmm0, xmm0     ; SSE4.2, and slower than byte/word/dword

PSADBW    xmm0, xmm0     ; sum of absolute differences
MPSADBW   xmm0, xmm0, 0  ; SSE4.1.  sum of absolute differences, register against itself with no offset.  (imm8=0: same as PSADBW)

  ; shift-counts saturate and zero the reg, unlike for GP-register shifts
PSLLDQ    xmm0, 16       ;  left-shift the bytes in xmm0
PSRLDQ    xmm0, 16       ; right-shift the bytes in xmm0
PSLLW     xmm0, 16       ; left-shift the bits in each word
PSLLD     xmm0, 32       ;           double-word
PSLLQ     xmm0, 64       ;             quad-word
PSRLW/PSRLD/PSRLQ  ; same but right shift

PSUBB/W/D/Q   xmm0, xmm0     ; subtract packed elements, byte/word/dword/qword
PSUBSB/W   xmm0, xmm0     ; sub with signed saturation
PSUBUSB/W  xmm0, xmm0     ; sub with unsigned saturation

;; SSE4.1
INSERTPS   xmm0, xmm1, 0x0F   ; imm[3:0] = zmask = all elements zeroed.
DPPS       xmm0, xmm1, 0x00   ; imm[7:4] => inputs = treat as zero -> no FP exceptions.  imm[3:0] => outputs = 0 as well, for good measure
DPPD       xmm0, xmm1, 0x00   ; inputs = all zeroed -> no FP exceptions.  outputs = 0

VZEROALL                      ; AVX1  x/y/zmm0..15 not zmm16..31
VPERM2I/F128  ymm0, ymm1, ymm2, 0x88   ; imm[3] and [7] zero that output lane

# Can raise an exception on SNaN, so only usable if you know exceptions are masked
CMPLTPD    xmm0, xmm0         # exception on QNaN or SNaN, or denormal
VCMPLT_OQPD xmm0, xmm0,xmm0   # exception only on SNaN or denormal
CMPLT_OQPS ditto

VCMPFALSE_OQPD xmm0, xmm0, xmm0   # This is really just another imm8 predicate value for the same VCMPPD xmm,xmm,xmm, imm8 instruction.  Same exception behaviour as LT_OQ.

SUBPS xmm0, xmm0 and similar won't work because NaN-NaN = NaN, not zero.

Also, FP instructions can raise exceptions on NaN arguments, so even CMPPS/PD is only safe if you know exceptions are masked, and you don't care about possibly setting the exception bits in MXCSR. Even the the AVX version, with its expanded choice of predicates, will raise #IA on SNaN. The "quiet" predicates only suppress #IA for QNaN. CMPPS/PD can also raise the Denormal exception. (The AVX512 EVEX encodings can suppress FP exceptions for 512-bit vectors, along with overriding the rounding mode)

(See the table in the insn set ref entry for CMPPD, or preferably in Intel's original PDF since the HTML extract mangles that table.)

AVX1/2 and AVX512 EVEX forms of the above, just for PXOR: these all zero the full ZMM destination. PXOR has two EVEX versions: VPXORD or VPXORQ, allowing masking with dword or qword elements. (XORPS/PD already distinguishes element-size in the mnemonic so AVX512 didn't change that. In the legacy SSE encoding, XORPD is always a pointless waste of code-size (larger opcode) vs. XORPS on all CPUs.)

VPXOR      xmm15, xmm0, xmm0      ; AVX1 VEX
VPXOR      ymm15, ymm0, ymm0      ; AVX2 VEX, less efficient on some CPUs
VPXORD     xmm31, xmm0, xmm0      ; AVX512VL EVEX
VPXORD     ymm31, ymm0, ymm0      ; AVX512VL EVEX 256-bit
VPXORD     zmm31, zmm0, zmm0      ; AVX512F EVEX 512-bit

VPXORQ     xmm31, xmm0, xmm0      ; AVX512VL EVEX
VPXORQ     ymm31, ymm0, ymm0      ; AVX512VL EVEX 256-bit
VPXORQ     zmm31, zmm0, zmm0      ; AVX512F EVEX 512-bit

Different vector widths are listed with separate entries in Intel's PXOR manual entry.

You can use zero masking (but not merge masking) with any mask register you want; it doesn't matter whether you get a zero from masking or a zero from the vector instruction's normal output. But that's not a different instruction. e.g.: VPXORD xmm16{k1}{z}, xmm0, xmm0

AVX512:

There are probably several options here, but I'm not curious enough right now to go digging through the instruction set list looking for all of them.

There is one interesting one worth mentioning, though: VPTERNLOGD/Q can set a register to all-ones instead, with imm8 = 0xFF. (But has a false dependency on the old value, on current implementations). Since the compare instructions all compare into a mask, VPTERNLOGD seems to be the best way to set a vector to all-ones on Skylake-AVX512 in my testing, although it doesn't special-case the imm8=0xFF case to avoid a false dependency.

VPTERNLOGD zmm0, zmm0,zmm0, 0     ; inputs can be any registers you like.

Mask register (k0..k7) zeroing: Mask instructions, and vector compare-into-mask

kxorB/W/D/Q     k0, k0, k0     ; narrow versions zero extend to max_kl
kshiftlB/W/D/Q  k0, k0, 100    ; kshifts don't mask/wrap the 8-bit count
kshiftrB/W/D/Q  k0, k0, 100
kandnB/W/D/Q    k0, k0, k0     ; x & ~x

; compare into mask
vpcmpB/W/D/Q    k0, x/y/zmm0, x/y/zmm0, 3    ; predicate #3 = always false; other predicates are false on equal as well
vpcmpuB/W/D/Q   k0, x/y/zmm0, x/y/zmm0, 3    ; unsigned version

vptestnmB/W/D/Q k0, x/y/zmm0, x/y/zmm0       ; x & ~x test into mask      

x87 FP:

Only one choice (because sub doesn't work if the old value was infinity or NaN).

FLDZ    ; push +0.0
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • "The `shr r/m16, imm8` variable-count form of the instruction was added 286," -- Actually, it was added with the 186. The 186 also introduced masking the shift/rotate counts with 31. (Except NEC V30, which supports 186 instructions but does not mask the shift count.) – ecm Feb 12 '20 at 11:16
  • 1
    @ecm: thanks, some of my old answers still have misinformation like that from an old version of the NASM appendix. Rewrote that paragraph here. – Peter Cordes Feb 12 '20 at 11:24
  • I'm just wondering, but wouldn't `0`ing a register in some cases actually require to solve the halting problem? Let's say we have a loop that, upon some condition, `0`s a single bit of `ax`, and loops until the 16 bits of `ax` are actually `0`. In this case, to know whether `ax` will ever take the value of `0` implies knowing whether the loop eventually stops. – ljleb May 16 '20 at 09:31
  • @Louis-JacobLebel: Yes, but that's a completely different question from this one. You can always zero the whole AX in a single instruction if that's what you want to do. Analyzing a program/loop to see if it eventually zeros a register is unrelated. – Peter Cordes May 16 '20 at 12:32
  • @PeterCordes I wasn't trying to discredit you nor anything alike, I was just wondering if it was actually possible to do this while removing the single instruction constraint. Maybe my comment would have seemed more appropriate if it had been located under the question rather than under your answer. I'm moving it, so I suggest to clean up this part of the thread (I don't have the rights to do it btw). – ljleb May 17 '20 at 05:13
  • @Louis-JacobLebel: I wasn't offended at all, and I see how your question follows from the *title* of this question, but the one-instruction constraint is part of the question body. Without that constraint, there are infinite ways (if you include I/O to load new code, otherwise a very large finite number on the order of 2^48 = x86-64 virtual address space size ~= amount of state a program can have). A few factors less to rule out programs that can't or might not produce a zero... (programs involving RDTSC or RDRAND are non-deterministic...) – Peter Cordes May 17 '20 at 05:14
  • @Louis-JacobLebel: We could delete our comments here, but they're not really any more on topic under the question. IMO the TL:DR is: yes what you describe is a subset of the halting problem, and is unrelated to what this question or answer are about. – Peter Cordes May 17 '20 at 05:19
  • I get your point. Even though "How many ways are there to reset a variable _under one instruction_?" and "How many ways are there to reset a variable?" are pretty close in edition distance, the answer to each of these questions might differ by an order of magnitude (infinity, anyone?). But I'm not sure whether the difference between two problems is always proportional to the difference between their solution. – ljleb May 17 '20 at 05:25
  • I mean, I could have asked about something not related to this question at all, and you would be right in this situation. But it seems to me that I was really close to be on topic, even if I wasn't perfectly. – ljleb May 17 '20 at 05:28
  • `shr al, 16 ; so 8b and 16b shifts can zero registers.` did you mean to use 8 instead of 16 on this one line? – Alexis Wilke Mar 04 '23 at 16:16
  • @AlexisWilke: Not particularly, although any number as low as 8 would still zero AL. Any count from 31 to 16 will zero any register narrower than 32-bit. The 8b is referring to the operand-size, not the shift count. A lower shift count would be faster on 286 and 186 (8086 doesn't support immediate shifts), but if you cared about performance you'd use `mov al, 0` or `xor ax,ax`. – Peter Cordes Mar 04 '23 at 18:12
4

A couple more possibilities:

sub ax, ax

movxz, eax, ah

Edit: I should note that the movzx doesn't zero all of eax -- it just zero's ah (plus the top 16 bits that aren't accessible as a register in themselves).

As for being the fastest, if memory serves the sub and xor are equivalent. They're faster than (most) others because they're common enough that the CPU designers added special optimization for them. Specifically, with a normal sub or xor the result depends on the previous value in the register. The CPU recognizes the xor-with-self and subtract-from-self specially so it knows the dependency chain is broken there. Any instructions after that won't depend on any previous value so it can execute previous and subsequent instructions in parallel using rename registers.

Especially on older processors, we expect the 'mov reg, 0' to be slower simply because it has an extra 16 bits of data, and most early processors (especially the 8088) were limited primarily by their ability to load the stream from memory -- in fact, on an 8088 you can estimate run time pretty accurately with any reference sheets at all, and just pay attention to the number of bytes involved. That does break down for the div and idiv instructions, but that's about it. OTOH, I should probably shut up, since the 8088 really is of little interest to much of anybody (for at least a decade now).

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 2
    Move with Zero Extend (386+) :) –  Jan 28 '11 at 15:46
  • I recall quite well 8088 and really liked it, 16bits - wow (compared to 6502). Past 6502, I had bad habits to use only ah/al and so. Counting the clocks was quite an adventure. As for sub/xor, both should be 3clocks but there was some catch I can't remember now. – bestsss Jan 28 '11 at 17:20
3

This thread is old but a few other examples. Simple ones:

xor eax,eax

sub eax,eax

and eax,0

lea eax,[0] ; it doesn't look "natural" in the binary

More complex combinations:

; flip all those 1111... bits to 0000
or  eax,-1  ;  eax = 0FFFFFFFFh
not eax     ; ~eax = 0

; XOR EAX,-1 works the same as NOT EAX instruction in this case, flipping 1 bits to 0
or  eax,-1  ;  eax = 0FFFFFFFFh
xor eax,-1  ; ~eax = 0

; -1 + 1 = 0
or  eax,-1 ;  eax = 0FFFFFFFFh or signed int = -1
inc eax    ;++eax = 0
ecm
  • 2,583
  • 4
  • 21
  • 29
Bartosz Wójcik
  • 1,079
  • 2
  • 13
  • 31
3

Of course, specific cases have additional ways to set a register to 0: e.g. if you have eax set to a positive integer, you can set edx to 0 with a cdq/cltd (this trick is used on a famous 24 byte shellcode, which appears on "Insecure programming by example").

ninjalj
  • 42,493
  • 9
  • 106
  • 148
3

You can set register CX to 0 with LOOP $.

I. J. Kennedy
  • 24,725
  • 16
  • 62
  • 87
  • Or REP CMPSB if you're allowed to clobber other registers. – I. J. Kennedy Jan 31 '11 at 17:33
  • 3
    `repe cmpsb` and `repne cmpsb` may both result in rcx/ecx/cx being nonzero; likewise `scas`. A `rep` prefix may be used with `lods`, `movs`, `stos`, `ins`, or `outs` to set the counter register to zero. As you mentioned correctly, these instructions have effects beyond changing the counter register. And as Peter Cordes mentioned, `rep lods` instructions may fault due to accessing memory. – ecm Feb 12 '20 at 11:23
2

Per DEF CON 25 - XlogicX - Assembly Language is Too High Level:

AAD with an immediate base of 0 will always zero AH, and leave AL unmodified. From Intel's pseudocode for it:
AL ← (oldAL + (oldAH ∗ imm8)) AND FFH;

In asm source:

AAD 0         ; assemblers like NASM accept this

db 0xd5,0x00  ; others many need you to encode it manually

Apparently (on at least some CPUs), a 66 operand-size prefix in front of bswap eax (i.e. 66 0F C8 as an attempt to encode bswap ax) zeros AX.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Eugene
  • 10,957
  • 20
  • 69
  • 97
  • 3
    Specifying other bases to `AAD` or `AAM` isn't supported by some machines, such as the NEC V20/V30. – ecm Feb 12 '20 at 11:30
2

In a comment, the OP writes that shifts can not use an immediate count (introduced with the 80186/80286). Therefore the targetted x86 CPU has to be an 8086/8088. (10 years ago this question was surely better tagged with [8086] instead of the recently (5 years?) introduced [x86-16])

The 8086 architecture provides 14 basic program execution registers for use in general system and application programing. These registers can be grouped as follows:

• The AX, BX, CX, DX, SI, DI, BP, and SP general-purpose registers. These eight registers are available for storing operands and pointers.
• The CS, DS, ES, and SS segment registers. These registers allow to address more than 64KB of memory.
• The FLAGS register. This register reports on the status of the program being executed and allows application program level control of the processor.
• The IP register. This Instruction Pointer register contains a 16-bit pointer to the next instruction to be executed.

An answer to the question about clearing a register on x86, can thus deal with zeroing any of the above registers, except of course the FLAGS register which is architecturally defined to always hold a 1 in its second bit position.

Next is the list of single instructions that can clear a register on 8086 and without relying on any pre-existing condition(s). The list is in alphabetical order:

encoding         instruction                register cleared           displacement
--------------   ---------------            -----------------------    ------------
25 00 00         and     ax, 0              AX
83 E0 00         and     ax, 0              AX BX CX DX SI DI BP SP
81 E0 00 00      and     ax, 0              AX BX CX DX SI DI BP SP
E8 -- --         call    0000h              IP                         -($+3)
9A 00 00 xx yy   call    yyxxh:0000h        IP
9A xx yy 00 00   call    0000h:yyxxh        CS
9A 00 00 00 00   call    0000h:0000h  (*)   IP and CS
E9 -- --         jmp     0000h              IP                         -($+3)
EA 00 00 xx yy   jmp     yyxxh:0000h        IP
EA xx yy 00 00   jmp     0000h:yyxxh        CS
EA 00 00 00 00   jmp     0000h:0000h  (*)   IP and CS
8D 06 00 00      lea     ax, [0000h]        AX BX CX DX SI DI BP SP
F3 AC            rep lodsb                  CX
F3 AD            rep lodsw                  CX
E2 FE            loop    $                  CX
B8 00 00         mov     ax, 0              AX BX CX DX SI DI BP SP
C7 C0 00 00      mov     ax, 0              AX BX CX DX SI DI BP SP
F3 A4            rep movsb            (*)   CX
F3 A5            rep movsw            (*)   CX
F3 AA            rep stosb            (*)   CX
F3 AB            rep stosw            (*)   CX
29 C0            sub     ax, ax             AX BX CX DX SI DI BP SP
2B C0            sub     ax, ax             AX BX CX DX SI DI BP SP
31 C0            xor     ax, ax             AX BX CX DX SI DI BP SP
33 C0            xor     ax, ax             AX BX CX DX SI DI BP SP

This list shows what is technically possible, and certainly not what you should use. The instructions that were marked with (*) are very dangerous or can only be used with caution.
It goes without saying that for call and jmp to work, you need executable code at the target location.

The best way to clear a general purpose register is to use xor reg, reg and if you don't want to change any of the flags, then use mov reg, 0.

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
  • 1
    Heh, `jmp/call 0000h:0000h` is only slightly less usable than near `jmp/call 0` to the start of the segment. You need to have machine code at linear address 0 where the IVT normally starts, but near-jumping to IP=0 isn't something you can do in the middle of a sequence of code either. (Unless there's a `ret` there to return from a call). Actually, the first IVT entry is #DE divide exception, so as long as you don't trigger that, you can have a `retf` at linear address 0. (If you don't jump back, you'd want to disable interrupts before overwriting the rest of the IVT with code.) – Peter Cordes Aug 24 '21 at 21:12
0

If you are working with 8-bit values on the 8086 then the fastest way to clear al is with “mov al, ah”, which is 2 cycles. “xor al, al” and “xor ax, ax” are both 3 cycles. Of course, you have to be sure ah is already 0.

  • 1
    Remember that code-fetch takes 4 cycles per chunk (byte for 8088, word for 8086), so xor-zeroing for byte regs is still faster than the fetch buffer fills, even on 8086. `mov al, 0` is also only 2 bytes of machine code, but https://www2.math.uni-wuppertal.de/~fpf/Uebungen/GdR-SS02/opcode_i.html lists it as 4 cycles on 8088 (for either 8-bit or 16-bit mov reg, imm, although the table lists the wrong machine code size, should be `1 + i(1,2)` for the no-ModRM short-for encoding. Anyway, that table doesn't account for code-fetch, which is the primary bottleneck on 8088, and somewhat on 8086. – Peter Cordes Aug 28 '23 at 01:23
  • See also [Increasing Efficiency of binary -> gray code for 8086](https://stackoverflow.com/q/67400133) for more about performance on 8086 / 8088. – Peter Cordes Aug 28 '23 at 01:24
-2
mov eax,0  
shl eax,32  
shr eax,32  
imul eax,0 
sub eax,eax 
xor eax,eax   
and eax,0  
andn eax,eax,eax 

loop $ ;ecx only  
pause  ;ecx only (pause="rep nop" or better="rep xchg eax,eax")

;twogether:  
push dword 0    
pop eax

or eax,0xFFFFFFFF  
not eax

xor al,al ;("mov al,0","sub al,al",...)  
movzx eax,al
...
ARISTOS
  • 336
  • 3
  • 7
  • 1
    That's not how `rep` works; it's ignored for instructions other than "string" instructions (which it doesn't apply to), or is effectively part of the opcode for instructions like [`pause`](https://www.felixcloutier.com/x86/pause) or `tzcnt` on CPUs that know about them. Also, x86 masks the shift count for scalar shifts with `& 0x1f` (modulo 32), or for 64-bit operand-size with modulo 64. You can only shift out all the bits for 16 or 8-bit registers. See my answer on this question which was posted 2 years before this. – Peter Cordes Feb 12 '20 at 06:34