2

In x86-32 assembly, parameters are stored on the stack but in x86-64, parameters stored in registers. What is the reason for this?

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
Ibrahim Ipek
  • 479
  • 4
  • 13

2 Answers2

9

It is (a lot) faster to access CPU registers than to access RAM.

Since 64bit CPU have a lot more general purpose registers (has nothing to do with being 64bit, it's just because they are newer/bigger), it makes sense to make use of them.

Thilo
  • 257,207
  • 101
  • 511
  • 656
  • It's there, in a linked article: https://en.wikipedia.org/wiki/X86_calling_conventions – mike.dld Aug 15 '16 at 12:33
  • Microsoft's 32-bit `__fastcall` and `__vectorcall` calling conventions use two call-clobbered registers (ecx and edx) for arg-passing even in 32-bit mode. It's not just a matter of taking advantage of more registers, it's a matter of a more complicated higher-performance ABI. – Peter Cordes Aug 15 '16 at 16:26
5

Store/reload round trips take instructions and cost ~6 cycles of store-forwarding latency, so modern calling conventions use a a more efficient design. This also saves instructions in some cases, since the caller can just generate the arg in a register and not push it. (And not have to pop the stack after return).

Since x86-64 is a new mode, it didn't have any requirements for backwards compat, so a brand new ABI with no legacy baggage could be designed. See this answer for some history about how the x86-64 SysV calling convention was designed, and why it's more efficient than the Windows x86-64 calling convention. (red zone, more arg-passing registers.) It is more complex than the windows convention, especially for varargs functions.


Passing the first couple args in registers is more efficient in 32-bit code, too, but introducing new calling conventions breaks backwards compat with libraries.

Even so, MS did that with __fastcall / __vectorcall, which use two call-clobbered registers (ecx and edx) for arg-passing even in 32-bit mode. The 64-bit versions of those calling conventions use more arg-passing registers, since x86-64 has more GP registers.

Unix/Linux hasn't tried to introduce a 32-bit new calling convention, basically just giving up on 32-bit as obsolete legacy code that's stuck being slow. (Although the 32-bit SysV ABI was extended with rules for passing / returning 16B SSE and 32B AVX vectors in vector regs, not on the stack).

See the tag wiki for links to calling convention docs, and performance links for more details about store-forwarding latency.

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847