Can moving memory to 32 bit register be unaligned access in NASM sometimes?

Question

I wonder if it is unaligned access in code like this:

section .text
 global _start
_start:
        mov eax, [arr + 1]

section .data
arr: db 1, 2, 3, 4, 5, 6, 7

What do you mean about whether is "can be"? Are you asking about if it's allowed, or if it can take place? — Thomas Jager, Apr 30 '21 at 18:35
Yes, of course, for the same reason that `mov word [arr], imm16` is an aligned store if you build normally, like I explained in your last question about this. [Is there unaligned access problem in NASM?](https://stackoverflow.com/q/67305522) — Peter Cordes, May 01 '21 at 00:46

score 4 · Accepted Answer · answered Apr 30 '21 at 19:08

4

Typical section alignment is 1000h, at least in PortableExecutables. When your program is linked and loaded to memory, the virtual address of section .data will be aligned, so the first data arr is aligned as well.

Loading a register from address mov eax, [arr + 1] is unaligned, of course, but it will work anyway, though not as fast as mov eax, [arr] would do.

answered Apr 30 '21 at 19:08

vitsoft

5,515
1
18
31

1

It is just as fast on recent Intel/AMD cores, as long as it is <= 32 bits and the access doesn't cross a cache-line boundary. It won't in the OP's sample. – Hans Passant Apr 30 '21 at 19:25
@HansPassant: Since Nehalem for Intel (and recent AMD), there's no perf penalty for any access size as long as you don't have a cache-line split. So on Skylake (or Zen 2), `vaddps ymm1, ymm0, [rdi + 30]` (32-byte load across the middle of one cache line, assuming RDI is aligned by 64) has no downside. Some earlier AMD I think had some kind of performance effects at narrower boundaries like 16 or 32 byte. (And/or non-atomicity for stores that spanned a 16-byte boundary within a cache line, while Intel did guarantee atomicity for 8-byte load/store within a cache lien.) – Peter Cordes May 01 '21 at 00:50

Can moving memory to 32 bit register be unaligned access in NASM sometimes?

1 Answers1