ARM assembly. Is it safe to use r13 (stack pointer) as a general purpose register?

Question

I'm writing an extremely optimized leaf function and to make it run faster I want to use R13 as a general purpose register. I preserve R13 by moving it to one of VFP registers before using it and before returning from function I restore it by moving it back. It looks like this:

/* Start of the function */
push { r4 - r12, r14 }
vmov s0, r13
/* Body of the function. Here I use R13
 * as a general purpose register */
vmov r13, s0
pop { r4 - r12, r14 }
bx lr

And it works. But I have read that some operating systems assume that R13 is always used as a stack pointer, and using it as a general purpose register can cause crashes. I should also say that this function is intended to run only on Android (Linux). Thanks!

not a good idea, and no real reason to use it for anything other than a stack. if you get an interrupt it is game over you crash. wrapping this with protection from interrupts is worse than just wrapping this code with pushing and popping some other register and using it. so just use another register. — old_timer, Mar 06 '19 at 15:34
Not sure that the r13 register is used for the interrupt stack. Interrupts are odd on ARM. — Robin Davies, Mar 06 '19 at 15:53
@old_timer: I assume/hope the OP is already using all the other GP registers, or they wouldn't be trying to scrounge 1 more by saving/restoring the stack pointer. — Peter Cordes, Mar 06 '19 at 22:08
use the stack then, dont mess with interrupts, esp not on android, linux... — old_timer, Mar 07 '19 at 00:46
whether or not r13 is used depends on the core and the mode., that is a good point though... — old_timer, Mar 07 '19 at 00:50
It would be good to take a higher level look at what you are trying to do. It seems you use the `lr` to return. Why not just push `lr` on the stack and use that register? I know it is often good to have 8 or 12 free registers. You can never get 16 as the r15/PC is not really *general purpose*. So typically you allocate 15 registers; R0-r12, R14 (LR) **AND THEN** R13(SP). Some instructions `ldrd`, `stm` like things in some ordering. You really should choose `LR` first and then `SP`. — artless noise, Mar 07 '19 at 16:53
It is very unlikely that this extra registers is in fact going to gain you a lot of performance. 8/12 registers will be okay to fill write lines. You need this many registers for memory operations (FIR filters, soft DMA, memset, etc). However a general purpose routine where the register aren't banking memory/array values, seems really unlikely and you probably can address the issue in other ways. — artless noise, Mar 07 '19 at 17:03
Thanks! But I already use all registers except R13 (SP) and R15 (PC) — Igor Yarmolyk, Mar 07 '19 at 17:13
I seem to remember that you take something like a 16-cycle pipe-line stall on a Cortex A-series CPUs moving register between the VPU and main register bank if it's needed immediately - which it is. — marko, Mar 11 '19 at 23:02
@old_timer - interrupts use a shadow `r13` - so that's probably ok. What won't OK is if the process gets a POSIX signal whilst `r13` is repurposed. — marko, Mar 11 '19 at 23:03
@IgorYarmolyk Re, *I already use all registers except...* Then please update your question showing how `lr` is restored. You couldn't have saved it on the stack as you do `vmov r13, s0` and immediately `bx lr`. Something is wrong with your pseudo code then. — artless noise, Mar 14 '19 at 13:49

Peter Cordes · Accepted Answer · 2019-03-12T01:00:04.683

Obviously you should only consider this if you're already using all the other GP registers, including lr, and can't shift some of your work to NEON registers, e.g. using packed-integer even if you only care about the low 32 bits.

(Using SIMD regs for more scalar integer is usually only useful if there's an isolated set of values that don't interact with the other values in your algorithm, and you don't need to branch on them or use them as pointers. Transfer between int and SIMD is slow on some ARM CPUs.)

This is very non-standard, and only even possibly safe in user-space, not kernel

If you have any signal handlers installed, your stack pointer must be valid when one of those signals arrives. (And that's asynchronous.)

There's no other async usage of the user-space stack pointer in Linux beyond signal handlers. (Except if you're debugging with GDB and use print foo(123) where foo is a function in the target process.)

As mentioned in comments on Can I use rsp as a general purpose register (the x86-64 equivalent of this question), there's a workaround even for signals:

Use sigaltstack to set up an alternative stack, and specify SA_ONSTACK in the flags for sigaction when installing a handler.

As @Timothy points out, if your scratch value of SP could be an integer that happens to "point" into the alt stack, the signal dispatch mechanism will assume this is a nested signal and won't modify SP (because in an actual nested-signal case that would overwrite the first signal handler's still in use stack). So you could be one push away from SP going into an unmapped page, unless you allocate twice as much as you need, and only pass the top half to sigaltstack. (Maybe just 2k or 4k for simple signal handlers that return after not doing much).

This should be safe even with nested signals: only the outer-most signal handler can start near the bottom of the alt stack, and use some of the allocated space beyond the actual altstack. Another signal will use space below that, if SP is still within the altstack. Or it will use the top of the altstack if SP has gotten outside the altstack.

Or you can avoid the need for this over-allocation by using SP to hold a pointer to something else that's definitely not the alt stack, if any of your GP registers need to be a pointer. Having it be a valid pointer opens you up to corruption instead of faults if a debugger uses the current SP for something, or if you get the altstack mechanism wrong. But that's just a difference in failure mode: either is catastrophic.

Hardware interrupts save state on the kernel stack, not the user-space stack. If they used the user stack:

user-space could crash the OS by having an invalid SP.
user-space could gain kernel privileges by having another user-space thread modify the kernel's stack data (including return addresses.)

(All user-space threads of a process share the same page table, and can read/write each other's stack mappings.)

Linux/Android is very different from a lightweight RTOS without virtual memory or strict enforcement of privilege separation.

As the stack switching is triggered using the stack pointer value it is nessescary to allocate double the required stack size and pass the top half to `sigaltstack`. — Timothy Baldwin, Mar 06 '19 at 17:57
@TimothyBaldwin: why would it have to be contiguous with the thread's main stack at all? Or the same size? If your signal handlers are all simple, you might only need a page or two for them to run and make a `sigreturn` system call to get the kernel to restore the old context. — Peter Cordes, Mar 06 '19 at 22:06
The OP is using the stack pointer as the general purpose register, it may contain any value including values in the range passed to `sigaltstack`. Suppose a signal occurs with the stack pointer pointing to base of the range passed to `sigaltstack`, in that case the current value the stack pointer will be used as the alternative stack is apparently already in use therefore there must be enough space below the range passed to `sigaltstack` for the signal handler. — Timothy Baldwin, Mar 12 '19 at 00:26
@TimothyBaldwin: Ah I see, I didn't know signal stacks checked if SP was already in that range and if so didn't start from the top of the block. But now that you mention it, obviously nested signals shouldn't do that. — Peter Cordes, Mar 12 '19 at 00:46

iocapa · Answer 2 · 2019-03-07T08:32:20.350

1

When a context switch/irq will trigger while your code is executing, the OS/hw will probably assume that R13 is TOS, so it will save it in the idea that it can restore the TOS when it resumes execution.

This might be a problem in your case.

A sensible approach would be to make the piece of code critical and somehow force the system tick/irq to pend until the routine finishes/R13 is restored.

You are probably better off using LR (R14) if you really need the extra register.

edited Mar 07 '19 at 08:32

answered Mar 06 '19 at 14:46

iocapa

51
1
5

The OP is running in user-space on *Linux*. Hardware interrupts save state on the kernel stack, not the user-space stack. If they used the user stack for anything: 1. user-space could crash the OS by having an invalid SP. 2. user-space could gain kernel privileges by having another user-space thread modify the kernel's stack data. (All user-space threads of a process share the same page table, and can read/write each other's stack mappings.) What you say would be true on a lightweight RTOS without virtual memory and separate user/kernel stacks, but not Linux/Android unless I'm very mistaken – Peter Cordes Mar 06 '19 at 22:03
You are correct. It was not obvious at first to me that he's doing userspace stuff. – iocapa Mar 07 '19 at 08:28
That's a fair point, you couldn't do this in the Linux kernel. Of course, unless you're an Android phone vendor, you don't get to run kernel code. And in case anyone else is wondering, using VFP regs to save/restore integer also implies non-kernel, unless this was inside a `kernel_fpu_begin()` / `kernel_fpu_end()` block. – Peter Cordes Mar 07 '19 at 08:43
@PeterCordes You did miss an important point here. `LR` is a much better register to use than `SP`. In fact, the newer EABI by ARM allows use of `LR`; you need to annotate your assembler to prevent attempts to trace it; alternate section information can be used to provide tracing info if needed. 'LR' is definitely used as a general purpose register in places within the ARM Linux kernel. Other OS's/hypervisors, etc use a banked `LR` as scratch to boot strap context switches while the banked SP is used for context stores. – artless noise Mar 07 '19 at 16:56
@artlessnoise: I was assuming that the OP was already using LR. It's not an instead, it's an "as well". BTW, did you mean to comment on my answer? I wasn't aware that there was any expectation to *not* use LR. – Peter Cordes Mar 07 '19 at 20:54
@PeterCordes The sample code by the OP shows `bx lr` and no restore. I am commenting on an 'answer' or at least higher principle that using the `SP` should be a very last resort and the fact that it should be exceedingly rare to need to use the `SP` as memory strides, of 8 or 12 will be fine with write buffers and bursts. It would only be certain filter taps where this would be needed. Your answers is fine; but there should be a large caveat that this is almost never the correct thing to do. – artless noise Mar 08 '19 at 16:20
@artlessnoise: bolded and expanded the caveat part of my answer. – Peter Cordes Mar 08 '19 at 22:21

ARM assembly. Is it safe to use r13 (stack pointer) as a general purpose register?

2 Answers2

This is very non-standard, and only even possibly safe in user-space, not kernel