how does the compiler decides whether the operand of an instruction is signed or unsigned and then set the conditional code registers accordingly

Question

For example, for the below C expression,

int x =-1; 
unsigned y = 1; 
if(x>y) 
x+=y;

while get compiled, the assembly version can be something like as below,

(supposed x in %eax, y in %edx)

mov $-1 %eax
mov $1 %edx
mov %eax %edp
add %edx %edp
cmp %eax %edx
cmovg %edp %eax

As cmovg is executed based on the evaluation of ~(SF^OF)&~ZF, will the CF also be set when CPU execute the instruction cmp %eax %edx? Notice that the binary form negative number -1 is the same as 2<<32-1.

x86? (probably, given the register names) - please add a tag for your architecture, be it x86 or not. — Damien_The_Unbeliever, Jul 02 '13 at 09:57
yep, its x86. (place holder as comments needs at least 15 characters in length) — JDein, Jul 03 '13 at 07:05

score 0 · Answer 1 · answered Jul 02 '13 at 09:57

0

It extracts such information from the type of the variable. The compiler does that in the semantic analysis phase where it adds semantic information to the parse tree and builds the symbol table. Some more information. A good book to understand a compiler is this one

answered Jul 02 '13 at 09:57

KiaMorot

1,668
11
22

1

the compiler follows the rules decribed here: http://stackoverflow.com/questions/5563000/implicit-type-conversion-rules-in-c-operators – Willem Hengeveld Jul 02 '13 at 10:13

score 0 · Accepted Answer · answered Jul 02 '13 at 10:51

Did you mean to ask "how does the cpu decide whether an operand is signed"? Because that's what your question looks like. The answer would be: it doesn't. Values are neither signed nor really unsigned, they're just "a bunch of bits". Instructions can have signed or unsigned variants, such as branches (but not comparisons), division, right shift, and long-multiplication. Addition, subtraction (and comparison, which is subtraction that only affects the flags) and short-multiplication (where you don't get that extra upper-half) do not have both signed and unsigned versions, but one version that's correct for both. The result is not affected by signedness.

For those instructions where the result is not affected by signedness, the flags sometimes are. ZF and SF are obviously the same either way, but signedness clearly matters for when there is overflow. But since OF and CF are separate flags, they can be set the different values.

For example, mov eax, -2 \ add eax, eax sets CF because there's unsigned overflow, but it doesn't set OF because there's no signed overflow. You can interpret it as a signed addition that didn't overflow or as an unsigned addition that did overflow (or even as both, but that's almost never useful), but the only difference is in which flags you care about.

The compiler obviously knows what's supposed to be signed and what is supposed to be unsigned, and can therefore choose between sar and shr (for right shift) and between idiv and div (for division) and it chooses the CC for cmovcc and jcc.

Great, thanks for your explanation. Do you know how the compiler decides whether a binary number is a signed or unsigned number? — JDein, Jul 03 '13 at 10:31
@JDein it doesn't. The source gives variables their types, either directly through declarations or implicitly using the specific rules of the source language - for example, if you compare an `int` to an `unsigned int` in C, the result of that comparison will be interpreted as though it was unsigned (so the compiler will emit instructions that look at the carry flag). The problem of "given some bits, do the bits represent a signed value or an unsigned value" is *completely unsolvable* (all patterns are valid as signed as well as unsigned), but the compiler never needs to do that anyway. — harold, Jul 03 '13 at 11:54

how does the compiler decides whether the operand of an instruction is signed or unsigned and then set the conditional code registers accordingly

2 Answers2