8

I've been looking for the answer for how to use BSWAP for lower 32-bit sub-register of 64-bit register. For example, 0x0123456789abcdef is inside RAX register, and I want to change it to 0x01234567efcdab89 with a single instruction (because of performance).

So I tried following inline function:

#define BSWAP(T) {  \
    __asm__ __volatile__ (  \
            "bswap %k0" \
            : "=q" (T)  \
            : "q" (T)); \
}

And the result was 0x00000000efcdab89. I don't understand why the compiler acts like this. Does anybody know the efficient solution?

jww
  • 97,681
  • 90
  • 411
  • 885
user25683
  • 91
  • 1
  • 4

2 Answers2

5

Ah, yes, I understand the problem now:

the x86-64 processors implicitly zero-extend the 32-bit registers to 64-bit when doing 32-bit operations (on %eax, %ebx, etc). This is to maintain compatibility with legacy code that expects 32-bit semantics for these registers, as I understand it.

So I'm afraid that there is no way to do ror on just the lower 32 bits of a 64-bit register. You'll have to do use a series of several instructions...

Dan Lenski
  • 76,929
  • 13
  • 76
  • 124
  • 3
    [Why do most x64 instructions zero the upper part of a 32 bit register](http://stackoverflow.com/q/11177137/995714) – phuclv Apr 26 '15 at 08:19
-1

Check the assembly output generated by gcc! Use the gcc -s flag to compile the code and generate asm output.

IIRC, x86-64 uses 32-bit integers by default when not explicitly directed to do otherwise, so this may be (part of) the problem.

Dan Lenski
  • 76,929
  • 13
  • 76
  • 124