Why floating point registers are different than general purpose ones

Question

Most architectures have different set of registers for storing regular integers and floating points. From a binary storage point of view, it shouldn't matter where things are stored right? it's just 1's and 0's, couldn't they pipe the same general purpose registers into floating point ALUs?

SIMD (xmm in x64) registers are capable of storing both Floating point and regular integers, so why doesn't the same concept apply to regular registers?

Pure speculation, but: before x86-64 with `xmm` etc., CPUs could have a stack-based floating point unit, which was sort of a supplementary add-on (before FPUs, CPUs just didn't have floating point support at all). My guess is that folks at AMD stuck with concept of floating point being an add-on, so they added `xmm` registers rather than extending the integer registers with SIMD instructions. Then, at some point in development they realized they could throw in integer SIMD instructions for the `xmm` registers, but then they stuck with `xmm` rather than unifying everything. — hegel5000, May 27 '20 at 15:54
One thing to keep in mind is that x86-64 is a programming language. It's lower-level than C, but higher-level than the actual micro-ops which x86-64 gets converted to. `xmm5`, `rbx`, `ebx`, etc. are just programming language constructs, and there might very well be unified integer+FP registers behind the scenes. — hegel5000, May 27 '20 at 15:59
@hegel5000: The XMM registers are not even unified in themselves! Intel processors, or at least some of them, have different physical places where they will keep the data for an XMM register depending on whether it was used for an integer or floating-point instruction. This is invisible to the asembly-language programmer; the processor keeps its own information about where the data is. Except it can be visible in the performance effects; alternating integer and floating-point instructions can be slower than a homogeneous sequence of either. — Eric Postpischil, May 27 '20 at 16:28
It only really makes sense to do this if your general-purpose registers are at least 64 bits. As such, x86-32 (and x86-16 before it) really couldn't use the same registers for both, and x86-64 was intentionally designed to resemble x86-32, hence kept the register architecture roughly similar. — Nate Eldredge, May 27 '20 at 17:18
And going back even further, on the 386 and before, the FPU was a physically separate chip (which not everyone chose to buy), and so it really had to have its own registers. — Nate Eldredge, May 27 '20 at 17:20
Another factor is register name space—the number of registers you can identify in an instruction. If an instruction set architecture has only four bits to identify a register number, a programmer can only have 16 items in registers at a time. If the floating-point and integer instructions use different register sets, they can have 16 integer items and 16 floating-point items. — Eric Postpischil, May 27 '20 at 18:10

score 6 · Accepted Answer · answered May 27 '20 at 17:11

For practical processor design, there are a lot more issues to consider than "a binary storage point of view".

For example, wire lengths matter, both because parallel paths that can move dozens of bits at a time take chip space, and because getting a signal along a wire takes time. Not much time for fractions of an inch, but still significant when a cycle is a fraction of a nanosecond. For comparison, light in a vacuum can travel about 11.8 inches in one nanosecond. Electrical signals in wires are slower.

That makes it a good idea to put registers close to the arithmetic unit that is going to use their contents. With separate integer and floating point registers the processor can have integer registers close to the general ALU, and floating point registers close to the floating point unit.

There are also issues of limited numbers of paths for reading and writing registers. With separate register banks, the ALU and the floating point unit have independent register access paths, allowing for more things to happen at the same time. Cycle times are no longer dropping rapidly, and one of the other sources of processor speed improvement is doing more in parallel.

I don't know which of these issues matter currently, but in general separating the register banks gives processor designers opportunities they would not have if the banks were combined.

Also important: For a fixed width of a register field in machine code, you can have e.g. 16 FP *and* 16 GP-int registers, or 16 unified registers. Eric made this point in comments. It's also discussed in more depth in the related Q&A [Is there any architecture that uses the same register space for scalar integer and floating point operations?](https://stackoverflow.com/q/51471978). This is very much an issue for modern x86 when SSE1 was introduced, because the 8086 machine-code format constrained 32-bit x86 to 8 registers which is not even enough for integer. (16 with x86-64) — Peter Cordes, May 27 '20 at 20:21
Also relevant: with register renaming onto a larger physical register file: integer registers are narrower than SIMD regs, so you can have a larger physical register file to rename them, and more modest FP rename capabilities (e.g. low-power Silvermont only does full out-of-order exec for integer, with FP ops divided between two in-order queues.) And yes, register file read/write ports are a big deal. — Peter Cordes, May 27 '20 at 20:27
When SSE1 was introduced, though, Intel wasn't using a separate register file: P6-family kept result right in ROB entries themselves, whether integer or SIMD. (P3 split 128-bit ops into 2x 64-bit uops, but later P6-family CPUs like Nehalem must have had large enough ROB entries for a whole 128-bit result. Sandybridge switched to using a physical register file along with introducing 256-bit AVX SIMD. https://www.realworldtech.com/sandy-bridge/) — Peter Cordes, May 27 '20 at 20:29

Why floating point registers are different than general purpose ones

1 Answers1

Linked

Related