1

I am new to assembly programming (x86 32bit architecture) and have a question about the following piece of code:

SECTION .data

Msg: db "Hello", 10
Len: equ $-Msg

SECTION .text

global _start

_start:
    ; Printing Msg to stdout
    mov eax, 4
    mov ebx, 1
    mov ecx, Msg  ; Passing the ADDRESS to the beginning of what's stored in Msg
    mov edx, Len  ; Are we passing the address of Len, or the value of Len?
    int 80H

    ; Terminating
    mov eax, 1
    mov ebx, 0
    int 80H

I was told that mov ecx, Msg instruction moves the address of where the Msg is stored into the ecx register.

What about the next instruction mov edx, Len?

  • If we move the Len value to the edx register, then shouldn't the instruction be written differently, like mov edx, [Len]?

  • If we move the address of Len then why is the system call to print the message so complicated? Why do you need a register to contain an address to the length of the message rather than the actual length value?

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
mercury0114
  • 1,341
  • 2
  • 15
  • 29

1 Answers1

5

Len doesn't have an address. A label defined with equ simply makes the name Len a convenient way to refer to a particular numerical value, which in this case is computed by the assembler and happens to be 6. It doesn't allocate any space in memory. And mov edx, Len is an immediate load that puts that numerical value 6 in the edx register.

In some sense, Msg is also a convenient way to refer to a particular numerical value - but here that numerical value happens to be the address of a certain location in memory that contains the bytes "Hello". So mov ecx, Msg is also an immediate load that puts that numerical value into ecx.

If you like, you can think of Msg: db "Hello", 10 as a shorthand for

Msg: equ $
    db "Hello", 10

It sets the label Msg equal to the assembler's current address, then assembles some bytes starting at the current address.

(Note that this answer is specific to nasm. Other Intel-syntax assemblers are generally similar; but for instance, in AT&T syntax, the instruction movl Len, %edx is a move from memory, the equivalent of Intel's mov edx, [Len]; it would attempt to fetch four bytes from address 6, which would crash. In that syntax you would instead write movl $Len, %edx.)

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • 2
    terminology: A `[disp32]` addressing mode is normally called "direct" (or absolute), because the address is literally there in the instruction encoding. Indirect would be `[reg]`. But yes, it's a load from memory instead of a mov-immediate. x86 doesn't give specific names to different subsets of its full addressing-mode capability (`[base + idx*scale + disp0/8/32]`), but some people like to name things. [Do terms like direct/indirect addressing mode actual exists in the Intel x86 manuals](https://stackoverflow.com/q/46257018) – Peter Cordes Jul 18 '20 at 21:52
  • " in AT&T syntax, the instruction `mov Len, %edx` is" -- Usually you'd come across `movl` as the instruction with AT&T syntax. – ecm Jul 19 '20 at 00:18
  • 2
    @ecm: Good point, though with gas it is optional in that example because the operand size can be deduced from the register name. I changed it anyway. – Nate Eldredge Jul 19 '20 at 00:19
  • @Nate Eldredge: You forgot one of the instructions in your text. – ecm Jul 19 '20 at 00:20