2

I wrote a simple Hello world program in NASM, to then look at using objdump -d out of curiosity. The program is as follows:

BITS 64
SECTION .text
  GLOBAL _start

_start:
  mov rax, 0x01
  mov rdi, 0x00
  mov rsi, hello_world
  mov rdx, hello_world_len
  syscall

  mov rax, 0x3C
  syscall

SECTION .data
  hello_world: db "Hello, world!", 0x0A
  hello_world_len: equ $-hello_world

When I inspected this program, I found that the actual implementation of this uses movabs with the hex value 0x402000 in place of a name, which makes sense, except for the fact that surely this would mean that it knows 'Hello, world!' is going to be stored at 0x402000 everytime the program is run, and there is no reference to 'Hello, world!' anywhere in the output of objdump -d hello_world (the output of which I provided below).

I tried rewriting the program; This time I replaced hello_world on line 8 with mov rsi, 0x402000 and the program still compiled and worked perfectly.

I thought maybe it was some encoding of the name, however changing the text 'hello_world' in SECTION .data did not change the outcome either.

I'm more confused than anything - How does it know the address at compile time, and how come it never changes, even on recompilation?

(OUTPUT OF objdump -d hello_world)

./hello_world:   file format elf64-x86-64

Disassembly of section .text:

0000000000401000 <_start>:
  401000: b8 01 00 00 00       mov    $0x1,%eax
  401005: bf 00 00 00 00       mov    $0x0,%edi
  40100a: 48 be 00 20 40 00 00 movabs $0x402000,%rsi
  401011: 00 00 00
  401014: ba 0e 00 00 00       mov    $0xe,%edx
  401019: 0f 05                syscall
  40101b: b8 3c 00 00 00       mov    $0x3c,%eax
  401020: bf 00 00 00 00       syscall

(as you can see, no 'Disassembly of section .data', which further confuses me)

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
Basil
  • 488
  • 1
  • 15
  • 1
    The string is known at compile time too. It statically exists in your executable. The compiler put it at the address in the first place, so of course it knows the address! (And in an ASLR or dylib environment this would still apply, because _all_ addresses relative to the module would get shifted as needed and the compiler would put a relocation entry so the loader knows there is an address reference there to fix up, but they would still stay the same relative to each other.) – CherryDT Sep 30 '22 at 15:31
  • 1
    Disassembly of the data section is an oxymoron, the data section generally does not contain instructions that could be sensible to disassemble. – ecm Sep 30 '22 at 15:32
  • @CherryDT But surely if every program knows the address of things such as this, that would just take up a ton of memory spaces of things waiting to be used and executed? Or is that actually what happens? :o – Basil Sep 30 '22 at 15:32
  • 1
    This is virtual memory, the memory page in question doesn't have to exist in memory physically, it can be paged in and out as needed, and it's the OS' memory manager's job to decide what to keep in physical memory at what times. Attempting to access an address belonging to a page that's not physically in memory will make it transparently get paged in by the kernel at that point in time. But with such a small program, most likely the whole program will be in memory from the start. – CherryDT Sep 30 '22 at 15:34
  • @CherryDT Ohhhhh it's virtual memory, not physical memory addresses! That's where I was getting confused - that makes a lot more sense then! Thanks! – Basil Sep 30 '22 at 15:37
  • 1
    In user-mode code, you will generally never see physical memory addresses. This is entirely abstracted away by the kernel. – CherryDT Sep 30 '22 at 15:38
  • 1
    I turned this into an answer. – CherryDT Sep 30 '22 at 15:41
  • 3
    Use `objdump -s` to dump the data section, too. You should find the string at the expected address. – fuz Sep 30 '22 at 15:51
  • `objdump -drwC` will show symbol names for relocations if you disassembler the `.o` generated by NASM. There won't be any relocations left in a non-PIE executable, since it's guaranteed to be loaded (mapped) at the absolute address specified in the ELF metadata. (`readelf -a` and look at the segment addresses from the program header) – Peter Cordes Oct 02 '22 at 20:59

1 Answers1

3

The string is known at compile time too. It statically exists in your executable. The compiler put it at the address in the first place, so of course it knows the address!

(And in an ASLR or dylib environment this would still apply, because all addresses relative to the module would get shifted as needed and the compiler would put a relocation entry so the loader knows there is an address reference there to fix up, but they would still stay the same relative to each other.)

And this doesn't mean that every program ever existing will have unique memory locations, nor does it mean that all contents of a program have to idly sit around and use up all of your memory even if they are rarely needed, because this is virtual memory.

The address is only meaningful within your own process, and the memory page in question doesn't have to exist in memory physically, it can be paged in and out as needed, and it's the OS' memory manager's job to decide what to keep in physical memory at what times. Attempting to access an address belonging to a page that's not physically in memory will make it transparently get paged in by the kernel at that point in time. But with such a small program, most likely the whole program will be in memory from the start.

In user-mode code, you will generally never see physical memory addresses. This is entirely abstracted away by the kernel.

CherryDT
  • 25,571
  • 5
  • 49
  • 74
  • The OP used `ld` to make a non-PIE executable, so the info for doing runtime fixups is gone (and absolute addresses are fixed). But yes, Linux does allow text relocations in PIE executables and shared libraries, so 64-bit absolute addresses can be used. But it's more efficient (especially for ASLR) to use relative addressing, `lea rsi, [rel hello_world]`, because as you say, the *relative* distance between code and data is a link-time constant, regardless of ASLR of the image base address. [How to load address of function or label into register](https://stackoverflow.com/q/57212012) – Peter Cordes Oct 02 '22 at 20:56