0

I'm trying to learn assembly, and it makes sense to an extent but I have a problem. I have this source file hello.sfml:

; nasm -felf64 hello.asml && ld hello.o

    global _start

    section .text
_start:
    ; write(1, message, 13)
    mov     rax, 1          ; syscall 1 is write
    mov     rdi, 1          ; file handle 1 is stdout
    mov     rsi, message    ; address of string to output
    mov     rdx, 13         ; number of bytes in the string
    syscall                 ; invoke OS to write the string

    ; exit(0)
    mov     rax, 60         ; syscall 60 is exit
    xor     rdi, rdi
    syscall                 ; invoke OS to exit
message:
    db  "Hello, World", 10  ; the 10 is a newline character at the end

Which works perfectly. I just don't understand why particular integer registers need to be used in different cases.

So for example, by trial and error I've discovered that when saying which syscall I want, e.g.

    mov     rax, 1  
    ...
    syscall 

I put the value 1 into the integer register rax, but I can also use the integer registers eax, ax, al, or ah.

I haven't been learning assembly for very long, so it may very well be an obvious question.

If my question isn't obvious: I want to know how to decide which integer register to move values to e.g. if there's some generic system for this, or if each different intention uses a different integer register.

I'm using NASM on 64-bit Ubuntu.

Edit: My question is not a duplicate of this one, because where that one's asking about where you would use smaller integer registers, I'm asking for a method of deciding which integer register to use.

Jacob Garby
  • 773
  • 7
  • 22
  • The registers `al`, `ah` and `ax` comes from the old 8 and 16 bits x86 architectures. `al` is the low 8 bits of `ax` and `ah` is the high 8 bits. When the i386 was introduced with 32 bits, then `ax` became the low 16 bits of the *extended* accumulator register `eax`. Then again it was extended in the 64 bits variant into `rax`. A good Intel x86 assembly history search should have dug that up for you. Or even most good tutorials should have included this I would think. – Some programmer dude Aug 29 '17 at 15:54
  • "most good tutorials" - could you link one? The tutorial I'm using seems to not be that good – Jacob Garby Aug 29 '17 at 15:55
  • To be honest I haven't actually looked at any x86 (16, 32 or 64 bits) tutorials in quite a while, so I unfortunately have no idea which exists, and which are good or not. It was just an assertion, because I think a good tutorial should have some history in its introduction. – Some programmer dude Aug 29 '17 at 15:59
  • @Someprogrammerdude: Not necessarily history, but certainly an explanation of registers and subregisters like AL, AH, AX, EAX, RAX etc. – Rudy Velthuis Aug 29 '17 at 16:28
  • 1
    You should read the documentation for the syscall. If the value is required to be in `al` you can load `1` into `rax`, `eax`, `ax`, or `al` but not in `ah`. All the others will load `al`. Also some background reading on the typical usage of registers would be good, especially `rbp`, `rsi` and `rdi`. Also, some instructions put the result is specific registers, for example after multiplication. – Weather Vane Aug 29 '17 at 18:42
  • @WeatherVane your second sentence was very useful. – Jacob Garby Aug 29 '17 at 19:10
  • 2
    [A list of x64 linux syscalls](http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/) describes which value should be written to which register. – zx485 Aug 29 '17 at 20:07
  • @zx485 _exactly_ what I was looking for, thank you! – Jacob Garby Aug 29 '17 at 20:12
  • 1
    You actually can't set just `ax` or `al` to `1`, that works only because the upper part of `rax` is set to zero. `eax` is different story, due to how x86_64 was defined by AMD, that one WILL clear the upper 32 bits of `rax`. `ah` is completely wrong, that one will not work even when `rax` was zero before `ah=1`. For `syscall` most of the arguments are 64b values, so you should set whole 64b register, but there are different ways how to do that (like `mov eax,1` actually does set also `rax` completely). – Ped7g Aug 29 '17 at 20:13

1 Answers1

4

Assembly or the x86 machine doesn't define which general purpose register (GPR) you should use, you may use any available GPR (or make one available), however, different environments define different conventions for register usage and parameter passing, and when you want to use others' code you have to obey these conventions.

Specifically, Linux x86-64 is using the following convention, as described in X86 psABI (section 3.2.3):

  1. If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used.

If it was a standard user-level code, that was the reason for the selection of rdi, rsi and rdx in the first example above, the first parameter is passed in rdi, the second in rsi and the third in rdx.

However, the above example demonstrates the Linux kernel internal calling convention of syscalls, which is similar to user-level application with some differences (section A.2.1):

  1. User-level applications use as integer registers for passing the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9.
  2. A system-call is done via the syscall instruction. The kernel destroys registers %rcx and %r11.
  3. The number of the syscall has to be passed in register %rax.

As you can see in the sample, each syscall defined the rax value based on a Linux System Call Table for x86-64 (as commented by zx485).
Note that syscall may have up-to 6 parameters, and unlike user-level code cannot use the stack for additional parameters.

There are different ABIs for Windows, for 32-bit or for other environment but I won't detail them here.

Regarding your comment on usage of al, ax and eax: when using the x86-64 architecture the requirement is to specify the number of syscall in rax, using any other part of the register is based on luck - if all the bits in the other parts of the register were zero, then you can use the lower bits - but you should not trust it.
A reminder:

rax is the full 64-bit register
eax is the lower 32-bits
ax is the lower 16-bits
al is the lower 8 bits
ah is the value in bits 8 through 15 

As you can see, using ah is wrong and may call a different syscall!

Haim Cohen
  • 333
  • 1
  • 6
  • Note that "x64" is usually only seen in Windows terminology. Linux calls it x86-64 or amd64. (Windows also uses "x86" to mean 32-bit, but Linux / SysV terminology uses it as a blanket term to include all x86 CPUs, and i386 or x86-32 for 32-bit.) – Peter Cordes Aug 31 '17 at 01:38
  • @PeterCordes, edited :) – Haim Cohen Aug 31 '17 at 15:34
  • Oh, I thought you actually were talking about the [x32 ABI](https://en.wikipedia.org/wiki/X32_ABI) as an example of another ABI :P (32-bit pointers in long mode). – Peter Cordes Aug 31 '17 at 15:36
  • Well, it is a general place holder, I meant for the 32 bit ABIs in Windows and Linux as a general example. But we can try and list all of the other options 8-) – Haim Cohen Sep 01 '17 at 01:39