2

I am trying to understand how assembly works with arguments and return values.

So far, I have learnt that %eax is is the return value and to load a single argument, I need to load the effective address of %rip + offset into %rid by using leaq var(%rip), %rdi .

To learn more about arguments, I created a c program that takes in 10 (11 arguments including the formatting string) to try and find out the order of registers. I then converted the C code into assembly using gcc on my Mac.

Here is the C code I used:

#include <stdio.h>

int main(){
  printf("%s %s %s %s %s %s %s %s %s %s", "1 ", "2", "3", "4", "5", "6", "7", "8", "9", "10");
  return 0;
}

And hear is the assembly output:

.section  __TEXT,__text,regular,pure_instructions
  .macosx_version_min 10, 13
  .globl  _main                   ## -- Begin function main
  .p2align  4, 0x90
_main:                                  ## @main
  .cfi_startproc
## %bb.0:
  pushq %rbp
  .cfi_def_cfa_offset 16
  .cfi_offset %rbp, -16
  movq  %rsp, %rbp
  .cfi_def_cfa_register %rbp
  pushq %r15
  pushq %r14
  pushq %rbx
  pushq %rax
  .cfi_offset %rbx, -40
  .cfi_offset %r14, -32
  .cfi_offset %r15, -24
  subq  $8, %rsp
  leaq  L_.str.10(%rip), %r10
  leaq  L_.str.9(%rip), %r11
  leaq  L_.str.8(%rip), %r14
  leaq  L_.str.7(%rip), %r15
  leaq  L_.str.6(%rip), %rbx
  leaq  L_.str(%rip), %rdi
  leaq  L_.str.1(%rip), %rsi
  leaq  L_.str.2(%rip), %rdx
  leaq  L_.str.3(%rip), %rcx
  leaq  L_.str.4(%rip), %r8
  leaq  L_.str.5(%rip), %r9
  movl  $0, %eax
  pushq %r10
  pushq %r11
  pushq %r14
  pushq %r15
  pushq %rbx
  callq _printf
  addq  $48, %rsp
  xorl  %eax, %eax
  addq  $8, %rsp
  popq  %rbx
  popq  %r14
  popq  %r15
  popq  %rbp
  retq
  .cfi_endproc
                                        ## -- End function
  .section  __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
  .asciz  "%s %s %s %s %s %s %s %s %s %s"

L_.str.1:                               ## @.str.1
  .asciz  "1 "

L_.str.2:                               ## @.str.2
  .asciz  "2"

L_.str.3:                               ## @.str.3
  .asciz  "3"

L_.str.4:                               ## @.str.4
  .asciz  "4"

L_.str.5:                               ## @.str.5
  .asciz  "5"

L_.str.6:                               ## @.str.6
  .asciz  "6"

L_.str.7:                               ## @.str.7
  .asciz  "7"


L_.str.8:                               ## @.str.8
  .asciz  "8"

L_.str.9:                               ## @.str.9
  .asciz  "9"

L_.str.10:                              ## @.str.10
  .asciz  "10"


.subsections_via_symbols

After that, I then cleared the code up which removes some macOS only settings? The code still works.

.text
  .globl  _main                   ## -- Begin function main
_main:                                  ## @main
  pushq %rbp
  movq  %rsp, %rbp
  pushq %r15
  pushq %r14
  pushq %rbx
  pushq %rax
  subq  $8, %rsp
  leaq  L_.str.10(%rip), %r10
  leaq  L_.str.9(%rip), %r11
  leaq  L_.str.8(%rip), %r14
  leaq  L_.str.7(%rip), %r15
  leaq  L_.str.6(%rip), %rbx
  leaq  L_.str(%rip), %rdi
  leaq  L_.str.1(%rip), %rsi
  leaq  L_.str.2(%rip), %rdx
  leaq  L_.str.3(%rip), %rcx
  leaq  L_.str.4(%rip), %r8
  leaq  L_.str.5(%rip), %r9
  movl  $0, %eax
  pushq %r10
  pushq %r11
  pushq %r14
  pushq %r15
  pushq %rbx
  callq _printf
  addq  $48, %rsp
  xorl  %eax, %eax
  addq  $8, %rsp
  popq  %rbx
  popq  %r14
  popq  %r15
  popq  %rbp
  retq

.data
L_.str:                                 ## @.str
  .asciz  "%s %s %s %s %s %s %s %s %s %s"

L_.str.1:                               ## @.str.1
  .asciz  "1 "

L_.str.2:                               ## @.str.2
  .asciz  "2"

L_.str.3:                               ## @.str.3
  .asciz  "3"

L_.str.4:                               ## @.str.4
  .asciz  "4"

L_.str.5:                               ## @.str.5
  .asciz  "5"

L_.str.6:                               ## @.str.6
  .asciz  "6"

L_.str.7:                               ## @.str.7
  .asciz  "7"

L_.str.8:                               ## @.str.8
  .asciz  "8"

L_.str.9:                               ## @.str.9
  .asciz  "9"

L_.str.10:                              ## @.str.10
  .asciz  "10"

I understand that at the beginning of the code, that the base pointer is pushed onto the stack which is then copied into the stack pointer for later use.

The leaq is then loading each string into each register that will be used as an argument to printf.

What I want to know is why are registers r10 r11 r14 and r15 before the first argument is loaded into memory and that registers rsi rdx rcx r8 and 'r9' loaded into memory after the first argument? Also why are r14 and r15 used instead of r12 and r13?

Also why is 8 added and subtracted from the stack pointer in this case and does it matter which order the registers are pushed and popped?

I hope all the subquestions are related to this question, if not let me know. Also car me up on any knowledge I may be getting wrong. This is what I have learnt by converting c to assembly.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
iProgram
  • 6,057
  • 9
  • 39
  • 80
  • 3
    Read the calling convention documentation instead. Anyway, the order of loading registers does not matter. Register choice for pushing arguments to the stack does not matter. Order on the stack **does** matter since that's the basis for pairing up the formal arguments with the values. The adjustment of the stack pointer is for alignment purposes. Also, you seem not to have enabled optimization hence you see unnecessary stuff such as adjusting `rsp` twice. – Jester Sep 26 '18 at 14:29
  • 2
    Mostly a duplicate of [What registers are preserved through a linux x86-64 function call](https://stackoverflow.com/q/18024672) which explains why the call-preserved regs are saved/restored so the function can use them itself, and links to the calling convention doc which explains everything else. – Peter Cordes Sep 26 '18 at 14:32
  • Possible duplicate of [What registers are preserved through a linux x86-64 function call](https://stackoverflow.com/questions/18024672/what-registers-are-preserved-through-a-linux-x86-64-function-call) – KYHSGeekCode Sep 26 '18 at 15:24
  • The asm would probably be easier to understand with optimization enabled. In optimized code, hopefully all the instructions are necessary, and not just useless like using RBX and other call-preserved registers as temporaries to hold `lea` results to be `push`ed. Optimized code should just align the stack and use the same scratch reg repeatedly. Or if you compile with `gcc -fno-pie -no-pie`, you'll get `push $L_.str.9` (at least on Linux, where static addresses fit in a 32-bit immediate in position-dependent executables.) – Peter Cordes Sep 26 '18 at 15:32
  • [x86-64 calling conventions](https://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions) – John Bode Sep 26 '18 at 18:50

1 Answers1

3

First, it looks like you are using unoptimized code so things are taking place that do not need to.

Look at the register state right before the call to printf that are not pushed on the stack:

rdi = format string
rsi = 1
rdx = 2
rcx = 3
r8 = 4
r9 = 5

Then 6 .. 10 are pushed on the stack in reverse order.

That should give you an idea of the calling convention. The first six parameters go through registers. The remaining parameters get passed on the stack.

What I want to know is why are registers r10 r11 r14 and r15 before the first argument is loaded into memory and that registers rsi rdx rcx r8 and 'r9' loaded into memory after the first argument?

That's just the order the compiler chose.

Also why are r14 and r15 used instead of r12 and r13?

Again, that's what the compiler chose. Not these are just being used a scratch locations. If the code were optimized, it is likely fewer registers would be used.

Also why is 8 added and subtracted from the stack pointer in this case and does it matter which order the registers are pushed and popped?

It could just be some boiler plate function code the compiler generates.

user3344003
  • 20,574
  • 3
  • 26
  • 62
  • After using up RBX and RBP (the call-preserved regs that don't need a REX prefix), gcc typically counts down from R15 when it needs more call-preserved regs, rather than using R12/R13. This is a good choice because there are some special cases involving r12 and r13 (the REX.B aliases of RSP and RBP) in addressing mode encodings that could lead to larger machine code in a function that needed 4 but not 6 extra regs. [rbp not allowed as SIB base?](https://stackoverflow.com/q/52522544) – Peter Cordes Sep 26 '18 at 17:38
  • 1
    The adjustment of RSP by 8 is not boilerplate; the x86-64 System V ABI requires 16-byte alignment of RSP before a `call`, and with the number of `push` instructions before/after it turns out it's off by 1 stack slot. – Peter Cordes Sep 26 '18 at 17:40