7

What is the fastest way of clearing a register (=0) in MIPS assembly?

Some examples:

xor    $t0, $t0, $t0
and    $t0, $t0, $0
move   $t0, $0
li     $t0, 0
add    $t0, $0, $0

Which is the most efficient?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
lois
  • 71
  • 1
  • 1
  • 2

6 Answers6

5

In many MIPS implementations, these ops will both compile to the same instruction, because typically 'mov $a, $b' is an idiom for or $a, $b, $0 and li $r, x is shorthand for ori $r, $0, x:

move $t0, $0
li $t0, 0

and these will both take place on the same pipeline, being architecturally equivalent:

xor $t0, $t0, $t0
and $t0, $t0, $0

and in every RISC implementation I've ever worked with, add is on the same pipe as xor/and/nor/etc.

Basically, this is all particular to the implementation of a particular chip, but they all ought to be single clock. If the chip is out of order, li or and x, $0, $0 might be fastest because they minimize false dependencies on other registers.

Crashworks
  • 40,496
  • 12
  • 101
  • 170
  • If MIPS is like ARM or PPC, [instructions are architecturally required to propagate a dependency on their input registers (for reasons related to `memory_order_consume`)](http://stackoverflow.com/questions/37222999/convert-c-to-assembly-with-predicated-instruction/37224546#comment61983585_37224546). So you definitely want to use `$0` as your only input source register, regardless of what you do with it. IDK if any out-of-order MIPS implementations recognize any specific zeroing idioms and doing even use an execution unit ([like x86 CPUs do](http://stackoverflow.com/a/33668295/224132)) – Peter Cordes Oct 24 '16 at 13:24
  • typo in previous comment: recognize zeroing idioms and **don't** even use an execution unit. Very unlikely for MIPS; it already has a zero register so probably less zeroing is needed than in some other ISAs. x86 already has good reason to detect zeroing idioms because they'd be false dependencies otherwise, and for partial-register reasons in some uarches; once you're looking for them anyway, it's less extra work to do more special-case stuff. – Peter Cordes Mar 12 '22 at 19:14
2

I seem to remember that $0 was creted specifically for this case, so I would expect that move $t0 $0 should be the recommended way to clear a register. But I have not done MIPS for almost 10 years ...

Guillaume
  • 18,494
  • 8
  • 53
  • 74
1

Given that all of those instructions take a single pipeline cycle, there shouldn't be much difference between them.

If any, I'd expect the xor $t0, $t0, $t0 to be best for speed because it doesn't use any other registers, thus keeping them free for other values and potentially reducing register file contention.

The xor method is also treated as a specific idiom on some processors, which allow it to use even less resources (e.g. not needing to do the XOR ALU operation.

andrewmu
  • 14,276
  • 4
  • 39
  • 37
  • 1
    CPU designers optimize x86 CPUs for the xor-zeroing idiom because it has the smallest code-size in x86's variable-length encoding. This in turn has made [xor-zeroing more efficient than `mov eax, 0` even apart from code-size](http://stackoverflow.com/questions/33666617/what-is-the-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and/33668295#33668295). Since that's not a factor for MIPS, I wouldn't expect MIPS CPUs to spend transistors detecting that both operands are the same for xor or sub. I'd also expect that reading `$0` is at least as cheap as reading any other reg. – Peter Cordes Oct 24 '16 at 13:05
1

On most implementations of the MIPS architecture, all of these should offer the same performance. However, one can envision a superscalar system which could execute several instructions simultaneously, as long as they use distinct internal units. I have no actual example of a MIPS system which works like that, but that is how it happens on PowerPC systems. A xor $t0, $t0, $t0 opcode would be executed on the "integer computations" unit (because it is a xor) while move $t0, $0 would not use that unit; conceptually, the latter could be executed in parallel with another opcode which perform integer computations.

In brief, if you find a system where all the ways you list are not equally efficient, then I would expect the move $t0, $0 method to be the most efficient.

Thomas Pornin
  • 72,986
  • 14
  • 147
  • 189
  • 2
    I think in most implementations mov is also on the integer unit -- `mov x,y` is usually a synonym for `or x,y,0`. That was the case on the EE anyway. – Crashworks Oct 27 '10 at 10:33
  • Not familier with MIPS, but is the move instruction any longer? On x86, longer instructions can often end up running longer than the "official" tick count due to memory/pipelining issues. Short instructions are preferred... – Brian Knoblauch Oct 27 '10 at 18:16
  • 1
    @Brian Knoblauch Nope -- the whole point of MIPS (and RISC generally) is that every instruction is exactly the same length. – Crashworks Oct 28 '10 at 00:20
  • I wouldn't say it's the "whole point" but it's indeed one of the advantages of RISC architectures (though it's getting a bit less pronounced with the addition of 16-bit subsets like mips16e and Thumb). – Igor Skochinsky Oct 28 '10 at 16:18
  • [MIPS R10000](https://en.wikipedia.org/wiki/R10000) was a 4-way superscalar out-of-order exec machine, from 1996. `xor`-zeroing on MIPS is architecturally required to have a dependency on the source regs (to carry a dependency for memory dependency-order, the thing that C++ `mo_consume` was designed to expose). So yes, for zeroing the good choices are ones whose sources only include the zero register (and optionally an immediate) to avoid false dependencies. Whatever actual instruction you use, it probably needs an integer ALU. – Peter Cordes Mar 30 '22 at 07:58
0

It probably depends on what other instructions will be in the pipeline at the same time: when the register was last used, when it will next be used and which internal units are currently in use.

I'm not familiar with the pipeline structure of any particular MIPS processor, but your compiler should be and I would expect it to choose whichever would be the fastest in a given code sequence.

Andrew Aylett
  • 39,182
  • 5
  • 68
  • 95
  • There are options which don't depend on the old value of the register so no, the best choice does *not* depend on surrounding code. `or $t1, $zero, $zero` is probably always as good as any other choice on any MIPS. It's probably safe to assume that a superscalar MIPS can run `addu` or `or` on any execution unit so back-end port pressure from surrounding code probably also doesn't matter. – Peter Cordes Nov 15 '19 at 20:01
0

You can simply use the $zero register as a reference and write its value, which is 0 or 0b00000000, into the register you want to clear up.

If you're working with floats or doubles you can simply declare a float and or double variable in .data as 0.0 and write it into the register you want to clear up whenever you want.

Example:

.data
     PI:       .float   3.14
     clear:    .float   0.0
.text
     main:
          lwc1 $f0, PI
          lwc1 $f0, clear

     li $v0, 10
     syscall
  • Wouldn't it be equally or more efficient to transfer or convert `$zero` to the FPU with an ALU instruction, instead of doing a load from memory? Or are GP->FP transfer instructions slow? – Peter Cordes Nov 15 '19 at 20:03