Why is %cl the only register the sal-operation will accept as a parameter?

Question

Today while working on an x86 Assembly-project I tried left-shifting the value of the %rax register by the value of %rbx using the salq-operation, but couldn't get it to work. Shifting with an immediate value worked fine, of course, but why I couldn't use the register I had no idea. I even tried using %bl, considering the error I was receiving when compiling was an operand size error, and I figured it was the size that was the problem - still didn't work.

After a lot of trial and error a friend linked me to a page explaining that %cl was the only register that could be used for sal. And lo and behold - now my code works. My curiosity is as to why, in terms of the technical side of it all, %cl is the only one we can use, and none of the three other scratch-registers that should be identical? Couldn't find anything about this online, so hoping someone here is invested enough in x86-Assembly to explain this. :)

Refer to the instruction set reference. The `cl` register is hard coded to be used as the shift amount. On sufficiently modern x86 processors, you can use `shlxq` to shift by an arbitrary register. — fuz, Oct 19 '21 at 16:31
[Is there a reason to specify shifts from 8-bit registers?](https://stackoverflow.com/q/43358639/995714) — phuclv, Oct 19 '21 at 17:04
"I even tried using `%bl`" Trial and error is not a good way to come to grips with an instruction set. Consulting the instruction set manual is... — Sep Roland, Oct 19 '21 at 19:30
@fuz Yes, I found out about the hard coding eventually, but to my knowledge the ```cl```-register is functionally identical to, say, ```al``` or ```bl```, so the hard coding of ```cl``` seemed odd to me. — Ludvig Nilsson, Oct 20 '21 at 09:59
@phuclv Did not find this while searching, my apologies for the duplicate. @SepRoland Yes, and my problem was eventually solved this way. My question was as to *why* ```cl``` is hard coded into the instruction set, rather than the use of any general 8-bit register. But yes, I actually did not know about the instruction set manual until a friend linked me to a page explaining it. I'm a second year software engineering student, and I'm a little surprised we weren't linked to the instruction set manual in the assignment. Will consult there in the future. — Ludvig Nilsson, Oct 20 '21 at 10:02

Nate Eldredge · Accepted Answer · 2021-10-19T20:09:37.957

This limitation comes from the 8086, so we have to go all the way back there to try to guess at the rationale.

One possible hint comes from the encoding. As you probably know, most 8086 ALU instructions consist of a one-byte opcode followed by a Mod-Reg-R/M byte which specifies the operands, one register and one R/M that can be register or memory. The register operand is specified in the 3-bit Reg field of this byte, and the R/M operand by the remaining 5 bits.

In the case of shift, we actually have 7 different instruction sharing an opcode. Opcode D2 is used for all the 8-bit variable shift and rotate instructions:

rol r/m8, cl
ror r/m8, cl
rcl r/m8, cl
rcr r/m8, cl
shl r/m8, cl
shr r/m8, cl
sar r/m8, cl

Since the cl operand for the shift count is hardcoded, the Reg field of the Mod-Reg-R/M byte is not needed for a register operand, and can be used instead to distinguish between these instructions. This is notated in instruction listings like

D2 /2   rcl r/m8, cl

in other words, rcl r/m8, cl uses opcode D2 and the value 2 in the 3-bit Reg field.

(shl r/m8, cl actually has two encodings, D2 /4 and D2 /6. Probably bit 1 is used to indicate logical or arithmetic shift, because shr and sar are D2 /5 and D2 /7 respectively. So in some sense D2 /4 is probably shl and D2 /6 is sal, but those are actually the same operation.)

The 16-bit variable shift and rotates are the same, but using opcode D3.

So by hardcoding the shift count register, the instruction set designers got to encode 7 instructions while using up only one available opcode. Had they allowed an arbitrary register to be used, they'd have needed 7 different opcodes for the 8-bit shift and rotate instructions, and 7 more for the 16-bit versions. The opcode map is pretty crowded. I suppose 60-6F were available at the time, but they probably wanted to reserve them for later use. Otherwise they would have had to drop instructions somewhere else, or adopting a more complicated encoding scheme, which would have meant spending more transistors on the decoder.

There remains the question of why they chose cl in particular, instead of say bl. They did have a general philosophy of using cx as a "count" register, and the other instructions that hardcode cx tend to use it as some sort of count: loop and rep for instance. They might have thought that would make it easier for programmers to remember.

As noted by fuz, the much later BMI2 extension did add shlx/shrx/sarx, which can take their shift count from an arbitrary register, but they have to be shoehorned into an odd corner of the instruction space and have a much more complicated encoding - needing a VEX prefix and a total of 5 bytes or more to encode.

Begs the question why an assembler would even needs to see that operand in source code -- loop uses `cx` but we don't say `loop %cx`.. — Erik Eidt, Oct 19 '21 at 18:48
@ErikEidt Why the use of `CL` in the source? The assembler needs to make the distinction between `rol al, cl` and `rol al, 1`. Different opcodes D2 vs D0. Of course an assembler could arbitrarily allow `rol al` (without the ',1' or the ',cl') and have it mean one of the two possibilities. As long as it is clear to the user, there should be no problem. — Sep Roland, Oct 19 '21 at 19:16
I vaguely recall some assembler that would indeed let you write `rol al` as shorthand for `rol al, 1`. — Nate Eldredge, Oct 19 '21 at 20:15
Yes, GNU assembler allows `rol al` as an implicit 1. IIRC, GCC even prints asm that way. GNU assembler *does* still optimize `rol al, 1` into the implicit-count form, instead of the imm8 form (new in 186) with an immediate 1. (For rotates, that actually makes it take an extra uop on current Intel, because implicit-1 rotates have a well-defined effect on OF while leaving other flags in the SPAZO group unmodified, while immediate and CL rotates can leave it unmodified. The manual says "if the masked count is 1", but I think CPUs actually go by opcode.) — Peter Cordes, Oct 19 '21 at 22:03
Very interesting! It's a very clever solution, thank you for the detailed description. — Ludvig Nilsson, Oct 20 '21 at 10:15

Why is %cl the only register the sal-operation will accept as a parameter?

1 Answers1

Linked