Assembly partial registers

Question

I've read an explanation about partial registers And wanted to understand why this code:

XOR EAX,EAX
MOV AX, BX

will work only if eax holds a non signed positive number But if there is a signed negative number in eax You need to make sure the new bits youve added are 111.., i understand it has something to do with two's complement But I still don't understand it.

Edit:

I understand the code above will work either way I wanted to understand why would you need to pad those extra 1's (when I don't zero eax)

It doesn't matter what's in `EAX` since that gets zeroed. What matters is what's in `BX`. Presumably what you are talking about is sign extending `BX` into `EAX`, but this code does zero extension. For non-negative numbers that of course is equivalent to sign extension since the sign bit is `0`. — Jester, Mar 08 '16 at 16:52
What you should actually do here is `movzx eax, bx`, or `movsx eax, bx`, for zero or sign extension. Or, that's your reference point for reasoning about what sequences are equivalent to which instruction. — Peter Cordes, Mar 08 '16 at 16:55
Well actually im trying to understand the usage of movzx, i mean why do you even need to pad those extra 1s — Yarden, Mar 08 '16 at 17:07
If you don't zero `eax` then whatever is left in the top half remains there. It might not even be all zeroes or ones, so no telling what the resulting 32 bit number would be. A 16 bit number sign extended to 32 bit will have a copy of the sign bit in the top half, which is all ones for a negative number, and all zeroes otherwise. — Jester, Mar 08 '16 at 17:22
You need to zero-extend or sign-extend if you want to use a narrow integer as an array index or something. e.g. `int foo(short a) { return LUT[a]; }` needs to sign-extend `a` into a 32bit register so you can use it in a 32bit or 64bit effective address, like `[LUT + eax*4]`. If `a` was unsigned, you'd zero-extend. — Peter Cordes, Mar 09 '16 at 05:11

score 3 · Accepted Answer · edited May 23 '17 at 10:28

More often than not, you work with unsigned numbers while hand-writing assembly. Of course, this is not always the case, and C's int promotions should be implemented somehow, right?

Let's start by explaining two's complement at the bit level. First of all, the topmost bit, when set, indicates a negative number, and when clear, a non-negative nne. If it's clear, the stuff acts as you would expect. That is, 0 is written 0x00000000, and 2147483647 is written 0x7FFFFFFF. However, for any negative number N, in order to get its absolute value, you have to do ~N-1, with all respective wrap-arounds. This lends into -1 being written 0xFF, and -2147483648 being written 0x80. This has some nice side effects, such that -0 == 0 and addition/subtraction is the same operation for all numbers.

Now, to your code....

XOR EAX,EAX

As a mathematical rule, exclusive-OR'ing something against itself will always yield zero as a result. So, you can think of this like an optimized `MOV EAX, 0'. BTW, you may want to read why XOR is better for this. Then...

MOV AX, BX

MOV is an instruction that literally means "to copy the bits as-is". That is, all cleared bits in the source are respectively cleared in the destination, and all set bits in the source are respectively set in the destination. In this case, AX will now contain an exact copy of BX's contents. In the x86 architecture, EAX, EBX, ECX, and EDX are all divided this way...

 ________________ ________ ____ ____
|      ERX       |   RX   | RH | RL |
|________________|________|____|____|
 \_________________________________/
          |       \________________/
      32 bits              |    \__/
                       16 bits   ||
                               8 bits

This means that for each register R, ERX represents all of its 32 bits, RX represents its lower 16 bits, RH represents the higher byte of its lower 16 bits, and RL represents the lower byte of its lower 16 bits. Thus, returning to your code, and assuming BX contains 0xFFFF (-1 in two's complement), this is what happens...

 ______________ __________________________________ __________________________________
| Instruction  | EAX                              | EBX                              |
|______________|__________________________________|__________________________________|
| XOR EAX, EAX | 00000000000000000000000000000000 | 00000000000000001111111111111111 |
|______________|__________________________________|__________________________________|
| MOV AX, BX   | 00000000000000001111111111111111 | 00000000000000001111111111111111 |
|______________|__________________________________|__________________________________|

Then, if we interpret AX as a two's complement number, we get the right answer, that is, -1. However, if we interpret EAX as a two's complement number, we get a wrong answer, 65535. In order to do this correctly, we have to do a sign-extended move. This means that the instruction will take into account the fact that the value is in two's complement form and will thus manipulate it correctly. See, for instance...

 _______________ __________________________________ __________________________________
| Instruction   | EAX                              | EBX                              |
|_______________|__________________________________|__________________________________|
| MOVSX EAX, BX | 11111111111111111111111111111111 | 00000000000000001111111111111111 |
|_______________|__________________________________|__________________________________|

Now, interpreting EAX as a 32-bit two's complement number will yield the right answer, -1. This is (yet) another advantage of two's complement. You can sign-extend by just copying the top-most bit as many times as needed.

[This explains why `xor`-zeroing is better than `mov reg, 0`](http://stackoverflow.com/questions/33666617/which-is-best-way-to-set-a-register-to-zero-in-x86-assembly-xor-mov-or-and). You might want to incorporate that link into your answer. Also, the x86 tag wiki has some links to register diagrams like ASCII one you created for a single register. — Peter Cordes, Mar 10 '16 at 21:41
Why do you keep putting `I hope this has led a light on you!` back in after moderators take it out? It's not even a meaningful English sentence. You can say "I hope this has shed some light on your problem", i.e. illuminated / clarified what was previously unclear / hard to see. But that doesn't sound very cool. I can't think of a good way to phrase what I think you're trying to say. I think it's redundant anyway; everyone that posts answers does it to try to help people. Either the asker of the question, or more often to help other people that read it in future. — Peter Cordes, Mar 14 '16 at 15:35
@PeterCordes: I didn't intended to reverse it. It was... kind of a mess. Undone that. — 3442, Mar 16 '16 at 03:06

Assembly partial registers

1 Answers1