6

I noticed that GHC's code generator does not currently output assembly that uses any of the lower machine registers like al. Even byte-size operations are implemented using rax on 64 bit and eax on 32 bit machines. GCC, however, frequently makes use of these smaller registers.

Are there any real performance benefits of using small registers like al?


One suggestion I've heard so far is that the opcode for inc al is smaller than inc rax (but not smaller than inc eax). Are there other, non-performance considerations why to use small registers?

nh2
  • 24,526
  • 11
  • 79
  • 128
  • 1
    Some of the shift instructions take only `cl`. In that case, you're *required* to use a small register. – Mysticial Jan 13 '14 at 20:11
  • 1
    In addition to smaller encoding, some instructions are simply faster with 8 bits than 32. – Cory Nelson Jan 13 '14 at 20:19
  • 1
    I think the answer as everyone has been implying is in the encoding. I wouldnt use GCC output as a reference necessarily. The intel documentation can/will show the encoding for various instructions and you can see for example if you want to add two small numbers you dont necessarily need 64 bits worth of immediates when maybe 16 will do. Likewise one byte of opcode may do per instruction rather than two or more. Beyond the binary fetching real estate bandwidth, number of instructions and size of each, then it becomes microcode issues which are hidden to us. – old_timer Jan 13 '14 at 20:41
  • 1
    [64 bit assembly, when to use smaller size registers](https://stackoverflow.com/q/6577458/995714) – phuclv Aug 08 '18 at 09:32

2 Answers2

5

Using the 64-bit registers (rxx) on x86-64 requires opcode-prefixes. Thus the instructions are longer, taking more space in memory and the instruction cache. I don't know if it also slows down decoding. The code-size could hurt performance if the bigger code is used in a loop that doesn't fit into the L1 instruction cache.

EOF
  • 6,273
  • 2
  • 26
  • 50
4

If you only use al for an 8-bit value, it leaves ah free for a second one.

Loading ax vs. rax may offer memory bandwidth advantages. However, it could also possibly cause problems too. Gotta be careful there.

Brian Knoblauch
  • 20,639
  • 15
  • 57
  • 92
  • 3
    Subregisters cannot be renamed, which cramps the processor's ability to parallelize operations. Most modern compilers will zero-extend bytes to fill the target register to avoid the false dependency. – Raymond Chen Jan 14 '14 at 03:23