I am learning x86-64 assembly from Computer Systems: A Programmer's Perspective and I came across an exercise which asks to translate a line of C code into (two) equivalent assembly instructions. The code is about copying a variable of one type into another using pointers.
The pointer variables are declared as follows:
src_t *sp; //src_t and dest_t are typedefs
dest_t *dp;
and the C code to be translated is:
*dp = (dest_t)*sp;
It is given that the pointers sp and dp are stored in registers %rdi and %rsi respectively, and that we should set the 'appropriate portion' of %rax (eg. %eax, %ax or %al) to do intermediate data copying (as x86-64 doesn't allow both source and destination to be memory references).
Now when src_t is unsigned char and dest_t is long, I did the following assembly code for it:
movzbq (%rdi), %rax //move a byte into %rax with zero extension
movq %rax, (%rsi) //move 8 bytes of 'long' data
But the book as well as Godbolt (using gcc with -O3) says it should be
movzbl (%rdi), %eax
movq %rax, (%rsi)
In this case, the byte is only(?) zero-extended to 4 bytes (%eax is 4 bytes long), but I read that if we do like
movl %edx, %rax
then the upper 4 bytes of %rax will also be set to 0.
I have two questions:
- Is
movl %edx, %raxequivalent tomovl %edx, %eax, that is, are the upper 4 bytes also set to 0 in the latter case? - Is
movzbq (%rdi), %raxequivalent tomovzbl (%rdi), %eax, i.e. doesmovzblalso set the higher 4 bytes to zero (likemovl), even though we don't mention the full register (%rax) but only a part of it (%eax)?