I've been looking for the answer for how to use BSWAP for lower 32-bit sub-register of 64-bit register. For example, 0x0123456789abcdef is inside RAX register, and I want to change it to 0x01234567efcdab89 with a single instruction (because of performance).
So I tried following inline function:
#define BSWAP(T) { \
__asm__ __volatile__ ( \
"bswap %k0" \
: "=q" (T) \
: "q" (T)); \
}
And the result was 0x00000000efcdab89. I don't understand why the compiler acts like this. Does anybody know the efficient solution?