5

I've been learning assembly, and I've read that the four main x86 general purpose registers (eax, ebx, ecx, and edx) each had an intended or suggested purpose. For example, eax is the accumulator register, ecx is used as a counter for loops, and so on. Do most compilers attempt to use registers for the suggested purpose, or do they ignore what the registers are "supposed" to be for and just assign values to the next available register?

Also, when looking at the x64 registers, I noticed that an extra eight general purpose registers were added, bringing the total number of gp registers to twelve if you ignore rbp, rsp, rsi, and rdi (since they have non-general purpose uses), and sixteen if you do include them. In normal user programs (i.e. browsers, word processors, etc, and not cryptographic programs that require lots of registers), how many of these registers are normally in use at any given time? Is it common for a program like, say, Firefox to be using all 12/16 normal registers at once, or do they only use a subset since they don't have enough variables to fill them all? I will look into this myself by disassembling binaries to see what the general case is, but I would appreciate an answer from someone more knowledgeable than I.

Also, do compilers normally use semi-gp registers (rsi, rdi, rsp, and rbp) for general purpose use if they're not currently being used for their non-general application? I was curious because I saw these registers listed as "general purpose," but even I can think of instances off the top of my head where these registers can't be used for general storage (for example, you wouldn't want to store variables to rbp and rsp and then push values to the stack!). So do compilers try to make use of these registers when they can? Is there a difference between x86 and x64 compilation, since x64 processors have more registers available, so that it isn't necessary to stuff variables into any available register?

  • 1
    All GP registers are general. They have special meaning only when specific, usually legacy, instructions are executed. For example of the quadruplet `rsi`, `rdi`, `rbp`, `rsp` only the latter has a special purpose, and due to the `call/ret/push/pop` and so on. If you don't use them (even implicitly) you can use it as an accumulator. This principle is general and compilers exploit it. – Margaret Bloom Apr 06 '17 at 17:13
  • @MargaretBloom aren't the rsi/rdi registers used by instructions like movsb for things like array/string copying? Also, is it common for a variable to only be "live" for periods between call/ret/push/pop instructions? It would seem like those instructions would be common enough that you wouldn't have enough "space" between these instructions to fit the entire life of a variable in. – James Preston Apr 06 '17 at 17:37
  • I posted an answer with a few example to convince yourself that compilers use the GP registers as freely as they can :) Compare a register like `rbp` with one like `gtr` - now the latter is really a specific purpose register – Margaret Bloom Apr 06 '17 at 17:45

2 Answers2

6

All GP registers are general.
They have special meaning only when specific, usually legacy, instructions are executed.

For example of the quadruplet rsi, rdi, rbp, rsp only the latter has a special purpose, and that's due to instructions like call, ret, push and so on.
If you don't use them, even implicitly (an unlikely situation admittedly), you can use it as an accumulator.

This principle is general and compilers exploit it.

Consider this artificial example[1]:

void maxArray(int* x, int* y, int*z, short* w) {
    for (int i = 0; i < 65536; i++)
    {
        int a = y[i]*z[i];
        int b = z[i]*z[i];
        int c = y[i]*x[i]-w[i];
        int d = w[i]+x[i]-y[i];
        int e = y[i+1]*w[i+2];
        int f = w[i]*w[i];

        x[i] = a*a-b+d; 
        y[i] = b-c*d/f+e;
        z[i] = (e+f)*2-4*a*d;
        w[i] = a*b-c*d+e*f;
    }
}

It is compiled by GCC into this listing

maxArray(int*, int*, int*, short*):
        push    r13
        push    r12
        xor     r8d, r8d
        push    rbp
        push    rbx
        mov     r12, rdx
.L2:
        mov     edx, DWORD PTR [rsi+r8*2]    
        mov     ebp, DWORD PTR [r12+r8*2]
        movsx   r11d, WORD PTR [rcx+r8]
        mov     eax, DWORD PTR [rdi+r8*2]
        movsx   ebx, WORD PTR [rcx+4+r8]
        mov     r9d, edx
        mov     r13d, edx
        imul    r9d, ebp
        imul    r13d, eax
        lea     r10d, [rax+r11]
        imul    ebx, DWORD PTR [rsi+4+r8*2]
        mov     eax, r9d
        sub     r10d, edx
        imul    ebp, ebp
        sub     r13d, r11d
        imul    eax, r9d
        imul    r11d, r11d
        sub     eax, ebp
        add     eax, r10d
        mov     DWORD PTR [rdi+r8*2], eax
        mov     eax, r13d
        imul    eax, r10d
        cdq
        idiv    r11d
        mov     edx, ebp
        sub     edx, eax
        mov     eax, edx
        lea     edx, [0+r9*4]
        add     eax, ebx
        mov     DWORD PTR [rsi+r8*2], eax
        lea     eax, [rbx+r11]
        imul    r9d, ebp
        imul    r11d, ebx
        add     eax, eax
        imul    edx, r10d
        add     r9d, r11d
        imul    r10d, r13d
        sub     eax, edx
        sub     r9d, r10d
        mov     DWORD PTR [r12+r8*2], eax
        mov     WORD PTR [rcx+r8], r9w
        add     r8, 2
        cmp     r8, 131072
        jne     .L2
        pop     rbx
        pop     rbp
        pop     r12
        pop     r13
        ret

You can see that most of the GP registers are used (I haven't counted them), including rbp, rsi and rdi.
None of the registers' uses is limited to their canonical form.

Note In this example rsi and rdi are used to load and read (both for each register) an array, that's a coincidence.
Those registers are used to pass the first two integer/pointer arguments.

int sum(int a, int b, int c, int d)
{
    return a+b+c+d;
}

sum(int, int, int, int):
        lea     eax, [rdi+rsi]
        add     eax, edx
        add     eax, ecx
        ret
phuclv
  • 37,963
  • 15
  • 156
  • 475
Margaret Bloom
  • 41,768
  • 5
  • 78
  • 124
  • Okay, that all makes sense. But is it likely that an average program (albeit a complicated one, like a browser) would use that many registers? That function is rather contrived -- is assembly like that something that one would expect to see in an actual program? I'm on my phone at the moment, so I can't check for myself now, but I'll look through a disassembly of Firefox when I can. Also, that program seems to switch between 32-bit and 64-bit annotations for registers (it uses eax and then rax, for example). Is that common? – James Preston Apr 06 '17 at 18:07
  • 5
    @JamesPreston extremely common, most integer variables are 32bit but pointers will be 64bit. Running out of registers on x86 is common, on x64 not so much, so a typical register pressure is apparently bigger than 6 but lower than 14 – harold Apr 06 '17 at 18:28
  • Thank you @harold. – Margaret Bloom Apr 06 '17 at 18:29
  • Some registers are also standardly occupied. (e/r)bp for the framepointer, r/ebx for SELF/THIS in OO languages and also caching the GOT in PIC code takes a register on x86. (RIP relative direct access on x86_64 avoids this) – Marco van de Voort Apr 07 '17 at 15:00
  • 1
    @MarcovandeVoort: GCC/clang (and other modern compilers) enable `-fomit-frame-pointer` as part of `-O2` or `-O3` normal optimizations (even in 32-bit mode since GCC4.6 for i386-Linux), so no, they don't reserve E/RBP as a frame pointer. GCC since 4.1 (maybe earlier) can use registers other than EBX as the GOT base in 32-bit code (e.g. using EAX or ECX in leaf functions): https://godbolt.org/z/vebM68. And of course RBX is never needed for the GOT because as you say, x86-64 added RIP-relative addressing to solve that and other problems. – Peter Cordes Oct 07 '20 at 03:05
  • @JamesPreston: [The advantages of using 32bit registers/instructions in x86-64](https://stackoverflow.com/q/38303333) - mainly code size; ideally you'd only use 64-bit operand size when truly needed, but compilers don't try super hard to prove that it's safe to use 32-bit operand size on local variables declared in the source as 64-bit types. (e.g. a `size_t` loop counter with a constant upper bound smaller than 2^32.) – Peter Cordes Oct 07 '20 at 03:19
  • @PeterCordes: Afaik 16-bit compilers could already do this. as I'm from the Wirthian side of things, I'm thinking about Topspeed Modula2 here, which also supported custom calling conventions. Delphi also supports it since forever, and Freepascal since 2.0 or 2.2 2005/2007. I'm not sure if that is really heavily exploited using BP as extra free reg though, more the simplification of the prologue/epilogue. – Marco van de Voort Oct 07 '20 at 06:48
  • @MarcovandeVoort: `[sp]` is not a valid 16-bit addressing mode so `[bp+disp]` is basically needed for random access to stack variables, unless you have 16-bit code that depends on 386 features like `[esp+dis]` (with an address-size prefix.) I have no experience with historical 16-bit x86 compilers; we had an Atari Mega4 STe until my dad got a GNU/Linux x86 box, and then I got my own Linux box. – Peter Cordes Oct 07 '20 at 06:57
4

Originally (as in the 16-bit 8086), the functionality of the registers was more limited than in later x86 processors. Only BX, BP, SI and DI were usable to address memory, and it was more common to use CISC-style instructions that did a number of operations with one instruction.

For example, the LOOP instruction decremented CX, compared it to zero, and jumped if it was still positive. If you look at code generated for current systems, you're not likely to see that, but DEC and JNE. The latter takes a bit more code space, but allows you to use any register.

80386 and 32-bit mode lifted most of the limits in addressing, allowing all registers to be used as pointers. Also, the more complex instructions fell out of fashion, which I think has to do with increased out-of-order execution and other optimization techniques in the processor itself.

So, for the most part, there are few reasons left to treat the registers differently. ESP/RSP is still the stack pointer, of course.

ilkkachu
  • 6,221
  • 16
  • 30
  • 2
    REP still uses CX/ECX. Also I think if you need to do 32-> 64 bit multiplication in 32 bit mode or 64 -> 128 bit multiplication in 64 bit mode you are still restricted to EDX:EAX (RDX:RAX). – idspispopd Apr 23 '19 at 11:13
  • @idspispopd: Yes, and more importantly `cl` is still needed for variable-count shifts, unless you have BMI2 for `shlx` / `shrx`. That and EAX/EDX for division (and the rare widening mul; not so rare for division by constants) are the only really common uses of registers as implicit operands (other stack ops of course). REP-stosb does occasionally get inlined by GCC, and even more rarely other rep-string instructions. See also [Why are rbp and rsp called general purpose registers?](https://stackoverflow.com/a/51347294) for a more complete list of implicit register uses. – Peter Cordes Oct 07 '20 at 03:12
  • But anyway, most multiplication is non-widening, with `imul reg,r/m32`, or done with shift / LEA. The `loop` instruction is slow for historical reasons ([Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?](https://stackoverflow.com/q/35742570)), and then the vicious circle of "nobody uses it so CPU vendors don't bother optimizing it, so it remains unused" (except for AMD which made it fast again since Bulldozer, so gcc/clang should maybe start using it with `-mtune=znver1` when it's a win for code size and uops) – Peter Cordes Oct 07 '20 at 03:15