ARMv6 Best Practices for Register Use in Function

Question

Total n00b at Assembly, but I feel like I'm getting the hang of it. However I have a question about best practices for using registers in a function.

As I understand it: of the 13 available general purpose registers on the ARM11, by convention registers 0-3 are meant to be used for passing in arguments (with 0 & 1 also being used for return values) while 4-12 are meant to be used for storing working values for the duration of the function.

However I've also seen code examples where people use registers 0-3 for working values as well, so long as any of them are available, since they do not require a push & pop of the previous value onto the stack.

While I can understand why someone might want to avoid the extra push & pop steps, it seems that using r0-r3 for anything aside from passing values in and out of a function could lead to problems down the road (since you have no guarantee that any function you call will preserve their values).

So what then is the best practice here? When should I (if ever) use registers 0-3 for working values and when should I dip into registers 4-12?

possible duplicate of [ARM to C calling convention, registers to save](http://stackoverflow.com/questions/261419/arm-to-c-calling-convention-registers-to-save) — auselen, Sep 07 '14 at 08:21
In pure assembly there's not really anything inbetween "follow the established calling convention" and "do whatever you want", but clearly the notion of "best practice" only applies to one of those. — Notlikethat, Sep 07 '14 at 11:15
Fair enough. Do you have any advice on how I might improve it, or avoid making that mistake? Or do you just think it's something that should be gleaned from the documentation? — CRThaze, Sep 08 '14 at 14:39
About "gleaning over the doc", these things are hard unless it's part of your daily work. You should always keep docs handy, knowing what they contain and checking them when needed for clarification helps. — auselen, Sep 09 '14 at 13:43

tangrs · Accepted Answer · 2014-09-07T11:32:16.873

3

it seems that using r0-r3 for anything aside from passing values in and out of a function could lead to problems down the road (since you have no guarantee that any function you call will preserve their values).

That's exactly when you could use r4-r11 since the ABI specifies that the callee must preserve these values :)

Registers r0-r3 are caller saved so the caller must ensure that any important values stored in those registers are saved before a function call. As the callee, you can do whatever you want on these registers.

edited Sep 07 '14 at 11:32

answered Sep 07 '14 at 03:08

tangrs

9,709
1
38
53

Do you really need to save r12? r4-r12 makes 9 registers. Which is an odd number. Not a favorite of computer world. – auselen Sep 07 '14 at 08:14
That's what the Raspberry Pi specific guide I've been reading has to say. Not sure if that's standard across ARMv6 implementations or not, or if it's just an error in my docs. – CRThaze Sep 07 '14 at 08:29
@auselen 32-bit ARM has 13 general registers, which is an odd number, and still a favorite in computer world – phuclv Sep 07 '14 at 10:15
1

r12 is not callee-saved under the [AAPCS](http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf) - the linker may corrupt it if it nees to veneer a far (>32MB) call. – Notlikethat Sep 07 '14 at 11:25
@LưuVĩnhPhúc You don't seem to understand the point. His answer was wrong. – auselen Sep 08 '14 at 09:09

artless noise · Answer 2 · 2014-09-08T16:12:59.497

... it seems that using r0-r3 for anything aside from passing values in and out of a function could lead to problems down the road (since you have no guarantee that any function you call will preserve their values).

Registers are faster than memory, registers are faster than L2 cache, registers are faster than L1 cache, registers are fast. By using R4-R8, you are creating extra store and loads. In hand-coded assembler, this will create extra instructions. For the ARM leaf assembler function, there is NO prologue and the epilogue is bx lr. How simple.

Your statement it seems that using r0-r3 for anything aside from passing values is not correct for a great many algorithms and functions. Consider a GCD implementation,

int gcd(int a, int b)
{
   while(a!=b)
     if(a>b)
       a = a - b;
     else
       b = b - a;
   return a;
}

The parameters a and b are constantly updated during the algorithm. The original a and b values are never needed after the first iteration. This fact is well known in compiler optimization as Static single assignment form. The registers are renamed as a₀, a₁, etc.

So the input parameters often do not need to be keep around. There is no need to copy them to an r4-r8 and force a generation of a stack frame. ARM compilers will strive to not do this. There is no need for a human to hand code this. If you have to, you are probably better off letting a compiler produce code, unless you are learning. An ARM gcd algorithm from David Seal's ARM ARM is,

gcd: cmp r0, r1
     subgt r0, r0, r1
     sublt r1, r1, r0
     bne gcd
     bx  lr

The routine is five instructions. If you saved the input parameters you would double the size of the routine.

gcd:
   stmfd sp!, {r4, r5}   ; extra code plus two data
   mov r4, r0            ; extra code
   mov r5, r1            ; extra code

1: cmp r4, r5
   subgt r4, r4, r5
   sublt r5, r5, r4
   bne 1b

   mov r0, r4            ; extra code to setup return
   ldmfd sp!, {r4, r5}   ; extra code plus two data
   bx  lr

For small inputs, you could triple the execution time. It is also arguably easier to understand the assembler without the extra code. You should never save registers you don't use. For production quality, professional code it always makes sense to use r1 to r3 in place and fore-go storing them on the stack.

Note: the register that you must save and can use safely are *r4-r8*. For any register over *r8*, you really need to know more about your system to use them. In assembler, there are no limitations and you can even use r13/`sp`/stack pointer for calculations, if you like. Breaking the AAPCS will have certain limitations in how you can inter-operate with compiler code. — artless noise, Sep 08 '14 at 16:10
Wow, thank-you! That really helps put things in perspective. — CRThaze, Sep 08 '14 at 16:48
One unclear point is that if your routine only uses two parameters (a,b), then registers R2 and R3 are still available for free use, without saving. So, you can remove the `stm` and `ldm` instructions in the 2nd example and replace r4,r5 with r2,r3. It is still better to keep them in r0,r1 if possible. The `mov` instructions are not needed. With a different routine/function, the r2,r3 maybe useful. — artless noise, Sep 08 '14 at 17:16

ARMv6 Best Practices for Register Use in Function

2 Answers2