0

In X86, What type of physical internal registers a CPU uses for XMM type registers. Would that be integer or vector physical registers?

I think vector registers are used because XMM registers are 128-bit registers. Any confirmation is appreciated.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
ShAd
  • 3
  • 1
  • What's the difference between those physical forms? Can you share a reference text for these terms? – Erik Eidt Dec 24 '22 at 17:45
  • A **register** in CPU is only a collection of flip-flop circuits, each of them can be either **0** or **1** . It is the **instruction** which decides whether the register contents will be treated by CPU as integer number, floating-point number, BCD, character, whatever. There are instructions in x86 AVX architecture which calculate with the XMM contents as with an array of integers ([PADD](https://www.felixcloutier.com/x86/paddb:paddw:paddd:paddq)), and other as float numbers ([HADDPD](https://www.felixcloutier.com/x86/haddpd)). – vitsoft Dec 24 '22 at 17:49
  • I think the XMM registers are just 128 bit registers physically which can be acted on by various instructions in a number of ways. Some logic operations operate on the whole 128 bits as one and other operations operate on them as vectors of floating point or integer values of various sizes. I don't think their physical build in the cpu is anything other than 128 bits. – Simon Goater Dec 24 '22 at 18:12
  • Okay, I think I didn't add enough details. some more additional info. I'm more looking at SSE like instructions. For example, ADDPD XMM1, XMM2. I'll reiterate the question as will this instruction be scheduled on vector units or regular INT based units? I'm not sure if this is enough information. – ShAd Dec 24 '22 at 18:49
  • I think you want to know how the elements of the vectors are processed by the CPU. As far as the programmer is concerned, you can assume they are processed vectorially, hence the name vector extensions. They generally are faster than serialising scalar values. I can't offer insight into the actual hardware, logic units etc., the instructions use if indeed that is what you want to know. – Simon Goater Dec 24 '22 at 21:31
  • @ErikEidt: The difference is which register-file it uses an entry in, for register renaming. – Peter Cordes Dec 24 '22 at 22:58

1 Answers1

2

XMM registers are vector registers. They're renamed onto the FP/SIMD register file, not (general-purpose) integer, regardless of whether you're using SIMD-integer or SIMD-fp instructions.

https://blog.stuffedcow.net/2013/05/measuring-rob-capacity/ shows how to approximately measure the capacities of the physical register files for integer vs. SIMD, since those can be a smaller limit than ReOrder Buffer size for hiding cache-miss latency.

Intel since Sandybridge and AMD since even longer ago have renamed registers onto physical register files, with separate ones for general-purpose integer vs. SIMD/FP.

https://www.realworldtech.com/sandy-bridge/5/ shows that Sandybridge's SIMD PRF has has 144 entries, vs. 160 entries in the general-purpose integer PRF. (vs. P6 family, Nehalem and earlier, not using a separate PRF, but keeping register values directly in the ROB). vs. Skylake with 180 entries in the integer PRF vs. 168 in the SIMD PRF https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Scheduler

Skylake splits further, with a separate register file for renaming 80-bit x87/MMX and AVX-512 mask registers (k0..7), separate from the 512-bit entries in the vector register file. https://travisdowns.github.io/blog/2020/05/26/kreg2.html

Also related:


For more about x86 CPU internals, see Agner Fog's microarch guide on https://agner.org/optimize/ and other links in https://stackoverflow.com/tags/x86/info

Also for good measure, Modern Microprocessors A 90-Minute Guide! is a good read, covering a lot of good general stuff about design considerations in modern CPUs.


For example, ADDPD XMM1, XMM2. I'll reiterate the question as will this instruction be scheduled on vector units or regular INT based units?

The uop for that instruction will run on a SIMD-FP execution unit, after the CPU reads its inputs from the appropriate register file or forwards one or both from a previous instruction.

On Intel CPUs, execution ports have both SIMD and integer execution units, so it can compete with add eax, ecx throughput. See https://www.realworldtech.com/haswell-cpu/4/ for Haswell vs. Sandybridge execution unit distribution. (Alder Lake added yet another execution port with just integer. See https://uops.info/ and Agner Fog's guides.)

On AMD CPUs, there are a separate group of SIMD/FP execution ports, independent from the integer execution ports. See a Zen 2 diagram for example: https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Block_Diagram So if a bunch of instructions are waiting for inputs that finally become ready, a Zen core can begin executing 4 integer and 4 FP/SIMD uops in the same cycle. Also some loads+stores. (The front-end is "only" 5 instructions or 6 uops wide, so it can't sustain that.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847