what's the purpose of using media registers that can hold 32 bytes

Question

I'm reading a textbook that introduces floating-point architecture based on AVX (for “advanced vector extensions”), below is the pictures of available media registers:

I don't understand why those register needs to be 256-bit (32 bytes), isn't that float data type is 4 bytes and double is 8 bytes, then we can just use normal integer registers suhc as %rdi, %rsi, %r8 etc, those 64-bit registers suffice?

Look up SIMD - a 256-bit (32-byte) register can be used to hold 4 `double`s or 8 `float`s, and there are special instructions to operate on 4 doubles or 8 floats at a time - significantly increasing the number of calculations you can do on your CPU. — nneonneo, Jul 22 '20 at 05:13
You *can* load a `double` into `%rdi`, but you can't do FP math like `addsd` on it there. See [Why floating point registers are different than general purpose ones](https://stackoverflow.com/q/62047194) for the CPU-architecture design reasons for that. Also of course you can do `vaddpd` on YMM registers to do 4 FP adds in parallel for the same cost as scalar (in CPUs with full-width SIMD execution units, like Sandybridge-family, and Zen2 and later). — Peter Cordes, Jul 22 '20 at 05:43
@PeterCordes Thanks for the answer. But why 256-bit register can FP math? — , Jul 22 '20 at 10:48
Because instructions like `VADDPS ymm1, ymm2, ymm3/m256` exist, that do 8x `float` additions in parallel. https://www.felixcloutier.com/x86/addps (Packed Single-precision). If you're only doing scalar FP math, you would only use XMM regs, and only care about the low 32 or 64 bits of it, with instructions like `addss` (Scalar Single-precision) — Peter Cordes, Jul 22 '20 at 10:53
see also [What are the 128-bit to 512-bit registers used for?](https://stackoverflow.com/q/52932539/995714) — phuclv, Nov 24 '20 at 06:19

what's the purpose of using media registers that can hold 32 bytes

0 Answers0