What does word-size mean with greater size registers?

Question

This question is since outdated so this is not a duplicate.

We typically refer to 64 and 32-bit computers. Traditionally, that corresponded directly to register size (from my understanding) and was the amount of data able to flow through a simple, non-pipelined processor in one cycle.

However, with 128 and 512-bit registers common, what does word size mean anymore? Instructions like movq move 64-bits in x86-64, but if bigger registers exist, why do we not call the CPUs 128-bits?

Separately, understand pointers, in a language like C, may be affected, but I don't see the correlation between word-size and pointer size, since pointers should depend on the size of the address space.

The question you linked is still as relevant or correct as it ever was. — Tom V, Sep 02 '23 at 18:14
Does anyone still use the word "word" in any useful way except for legacy reasons (e.g. Windows's `WORD` and `DWORD` types)? It felt like it had already fell out of use when I started programming in the 80's. — ikegami, Sep 02 '23 at 18:14
@ikegami: On ARM word means 32 bits and 16 bits is called a halfword. These terms are used in the ABI documentation but not so much by application developers, — Tom V, Sep 02 '23 at 18:16
I think the term is outdated, essentially overtaken by a long(ish) history. As you say, we refer to 32-bit and 64-bit computers — without even mentioning the term word or word size. The salient feature of 64-bit computer is the 64-bit address space. — Erik Eidt, Sep 02 '23 at 18:16
As well as the ABI docs they are used in the instruction set mnemonics, so assembly programmers need to know them. — Tom V, Sep 02 '23 at 18:18
@ikegami: Yes, they do when writing assembly language for a specific ISA, whose manuals define the width of a word and may use it in instruction mnemonics. (Like RISC-V `lw` (Load Word) to load 32 bits, or `ld` (Load Double-Word) to load 64 bits into a general-purpose integer register.) Or in casual usage where you don't need to be specific, you could say "a word-at-a-time strlen implementation" to describe a non-SIMD bithack like glibc's portable fallback, with the implication being that you use `unsigned long` chunks or something. But yeah, it's not a very specific term in general. — Peter Cordes, Sep 02 '23 at 18:56
Generally you wouldn't count SIMD capabilities for the "word size", whatever that even means (the "word size = size of everything" view shown by some of the linked answers only really applied to some RISC processors and not even modern ones) — harold, Sep 02 '23 at 18:58
So does the [x]-bit of the computer refer to register size or address space @ErikEidt? — user129393192, Sep 02 '23 at 19:00
Related: [What's the size of a QWORD on a 64-bit machine?](https://stackoverflow.com/q/55430725) discusses the fact that "machine word" is much less meaningful on a machine like x86 that's very much not word-oriented, and getting less meaningful even on modern RISCs. The existence of SIMD registers is yet another factor in making word-size less relevant. But no, you wouldn't say "128-bit CPU" because of SIMD width. The bitness applied to CPU is cynically the widest thing marketing can justify, e.g. data-bus width or register width, or address size. But not SIMD register width. — Peter Cordes, Sep 02 '23 at 19:02
But if you have something like 128-bit registers, couldn't you then call that 128-bit? That would enable a 128-bit address space and that much data flow through the pipeline in one simple cycle. @PeterCordes — user129393192, Sep 02 '23 at 19:03
As I mentioned, the salient feature of a 64-bit processor is the large address space, and it follows that the processor would have natural features for manipulating 64-bit data (i.e. pointers); this would include having 64-bit registers, 64-bit arithmetic, etc.. — Erik Eidt, Sep 02 '23 at 19:04
Because the address space is arguably the most prominent feature of an N-bit computer, then in order to call a computer 128 bits (fyi RISC V defines a 128-bit spec..) you'd have to be able to dereference a 128-bit pointer (RISC V 128-bit supports that). — Erik Eidt, Sep 02 '23 at 19:05
@user129393192: Yes, if they're **integer** registers you can keep pointers in, and do addition of a single 128-bit integer (not two separate 64-bit adds, which is all you can do in any existing SIMD ISA). For example RV128 is a 128-bit extension of the RISC-V architecture, with a 128-bit flat address-space. https://en.wikipedia.org/wiki/RISC-V . It only exists on paper, and the spec isn't finalized, because we're far enough away from running out of 64-bit address-space that nobody's wanted to build one. — Peter Cordes, Sep 02 '23 at 19:07
I see. Thanks @PeterCordes. So to be clear, the reason the most we have is 64-bit right now is that 128-bit and above are not *integer* registers, and SIMD ops don't count when considering something like a dereferenced pointer or other important things that make up the [x]-bitness of a CPU? It all falls down to the fact that with > 64-bit adds, there would be multiple operations, and it is not just one "simple CPU cycle"? If this is correct, I would accept that as an answer, since that was fundamentally my quesiton. — user129393192, Sep 02 '23 at 19:20
Yeah, I think the main consideration is really address-space these days. (Unlike in the past; we didn't call 8086 a 20-bit CPU.) Even if x86-64 had a `vpadddq` instruction for `__int128` addition in SIMD registers (even if it had a way to chain that with carry propagation to do bigint addition faster), I think we'd still call it a 64-bit ISA. Fun fact: Agner Fog's ForwardCom architecture design discussions (https://www.agner.org/optimize/blog/read.php?i=421#548) included the idea of having SIMD instructions for per-element carry-out or something to make it usable for bigint addition. — Peter Cordes, Sep 02 '23 at 19:39
I agree: for me, besides address space / pointer width, the other important indicator of word size / "bitness" is ALU width: what is the largest size for which you have a full suite of fast single-instruction arithmetic? Though this is almost always equal to the width of the general-purpose scalar integer registers. (Here I mean ALU width in the architectural sense, so the Z80 for me is still an 8-bit CPU as it has 8-bit arithmetic instructions, even though they are implemented internally with a 4-bit ALU.) — Nate Eldredge, Sep 02 '23 at 19:52
IMO it is completely pointless and used use by guys who sill hibernate in the 80's. It is very informal unless you clarify it at the beginning of your documentation, article or ABI specification — 0___________, Sep 02 '23 at 21:02

Alexis Wilke · Answer 1 · 2023-09-02T22:30:42.290

The meaning changes depending on the processors, of which I include a few examples below. I think that because of the concept of forward compatibility, once a WORD was given a size (say 16 bits), it stayed that way for the whole series of those processors. Note that some processors started as 32 bits and still had a WORD of 16 bits (see 68000). So, more or less, you have to read that processor's documentation to find out the size of a WORD and as a result there is no exact definition that would apply to all processors.

As pointed out by Peter Cordes, the word is also used in other places such as C/C++ and there they can really mean anything. In this strlen() post, the comment says:

/* All these elucidatory comments refer to 4-byte longwords,
   but the theory applies equally well to 8-byte longwords.  */

in which case, we could say that the term "longword" refers to the architecture general register size of 32 or 64 bits.

Intel/AMD Processors

For the Intel family of CPUs a WORD is 16 bits, even though those CPUs now support up to 512 bits registers (with AVX512, see the ZMM registers).

The operands of these instructions are packed integers of byte, word, or double word sizes. The operands are stored as 64 or 128 bit data in MMX registers, XMM registers, or memory. (Vol 1. 12.6 SSSE3 INSTRUCTIONS)

This has been the case from the 8086 since at that time registers were 16 bits. The meaning has not changed (see 4.1 FUNDAMENTAL DATA TYPES).

Byte -- 8 bits
Word -- 16 bits
Double Word -- 32 bits (often abbreviated DWord)
Quadword -- 64 bits (often abbreviated QWord)
Double Quadword -- 128 bits (see 4.1.1 Alignment of Words, Doublewords, Quadwords, and Double Quadwords)

I think many older processors use WORD to mean 16 bits. If you're using Intel / AMD or similar processors, for sure, a WORD is 16 bits and I don't think it's going to change.

68000 Processor

This process uses a WORD of 16 bits as well, even though it is also a 32 bit processor (and there were no 16 bits version).

Byte -- 8 bits
Word -- 16 bits
Longword -- 32 bits

ARM Processor

Here is the list of data types for the ARM processor and it is different. A word is 32 bits. Yet newer processors are 64 bits.

Again, I think that comes from the fact that they did not want to change the terminology for their users could get really confused if it changed just because a processor becomes more powerful.

R5000 (Risc)

These newer MIPS processors are also 64 bits processors. Yet a WORD is only 32 bits. Again, probably to maintain compatibility with older 32 bit versions of the processor.

CUDA

The CUDA processors use a WORD of 32 bits. I did not find the definition of data types, but found this section where they reference a half-word as being 16 bits. They also have a byte (8 bits). Again, this CPU supports 64 bits and has been for a while.

SPARC

I could not find a clear list of types in the documentation I have, but there is this clear reference which defines a half-word as 16 bits.

half-word -- 16 bits
word -- 32 bits
double word -- 64 bits

CRAY-1

The CRAY computers used 64 bits CPUs from day one. As a result, their word is 64 bits.

COMPUTATION SECTION

64-bit word

12.5 nanosecond clock period

2's complement arithmetic

...

Your answer doesn't explain other usages of the term "word", as in "word-at-a-time strlen implementation". See the comments in glibc's code in [Why does glibc's strlen need to be so complicated to run quickly?](https://stackoverflow.com/q/57650895) where they describe an `unsigned long` as a "longword" and check whether it's 32 or 64-bit. In casual discussion about algorithms and cpu-architecture, "word" is often used as short-hand for a larger chunk the machine can process efficiently (e.g. reg width), which for ISAs that evolved out of narrower ones, is not what the manuals call a "word" — Peter Cordes, Sep 02 '23 at 21:50
R5000 is a MIPS. RISC is a general *category* of ISAs. Other RISCs use different terminology, like SPARC actually uses long-word and short-word ([What's the size of a QWORD on a 64-bit machine?](//stackoverflow.com/posts/comments/97579745)). Oh, perhaps that's the target that glibc's C fallback strlen comments are based on. (But true that basically no mainstream RISC ISA uses plain "word" to mean 64-bit in the technical terminology of its manuals; instruction words are 32-bit on typical RISCs, or a mix of 16 and 32-bit only aligned by 2 bytes. MIPS-style word / dword terms are common.) — Peter Cordes, Sep 02 '23 at 21:53