1

I'm reading a textbook that introduces floating-point architecture based on AVX (for “advanced vector extensions”), below is the pictures of available media registers:

enter image description here

I don't understand why those register needs to be 256-bit (32 bytes), isn't that float data type is 4 bytes and double is 8 bytes, then we can just use normal integer registers suhc as %rdi, %rsi, %r8 etc, those 64-bit registers suffice?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 3
    Look up SIMD - a 256-bit (32-byte) register can be used to hold 4 `double`s or 8 `float`s, and there are special instructions to operate on 4 doubles or 8 floats at a time - significantly increasing the number of calculations you can do on your CPU. – nneonneo Jul 22 '20 at 05:13
  • 1
    You *can* load a `double` into `%rdi`, but you can't do FP math like `addsd` on it there. See [Why floating point registers are different than general purpose ones](https://stackoverflow.com/q/62047194) for the CPU-architecture design reasons for that. Also of course you can do `vaddpd` on YMM registers to do 4 FP adds in parallel for the same cost as scalar (in CPUs with full-width SIMD execution units, like Sandybridge-family, and Zen2 and later). – Peter Cordes Jul 22 '20 at 05:43
  • @PeterCordes Thanks for the answer. But why 256-bit register can FP math? –  Jul 22 '20 at 10:48
  • 1
    Because instructions like `VADDPS ymm1, ymm2, ymm3/m256` exist, that do 8x `float` additions in parallel. https://www.felixcloutier.com/x86/addps (Packed Single-precision). If you're only doing scalar FP math, you would only use XMM regs, and only care about the low 32 or 64 bits of it, with instructions like `addss` (Scalar Single-precision) – Peter Cordes Jul 22 '20 at 10:53
  • see also [What are the 128-bit to 512-bit registers used for?](https://stackoverflow.com/q/52932539/995714) – phuclv Nov 24 '20 at 06:19

0 Answers0