I have function that I'm writing in assembly and I want to be sure what is going to give me the best throughput.
I have a 64bit value in RAX and I need to get the top most byte and perform some operations on it and I was wondering what is the best way of going about this.
shr rax, 56 ; This will get me the most significant byte in al.
However, is this more effective than...
rol rax, 8
and rax, r12 ; I already have the value 255 in r12
The reason why I'm asking is that on some architectures, shifting speed is a function of the number of shifts that you do. If I recall, on the 680x0 chips it was 6 + 2n where n was the shift count. I don't think this is true on x86 architectures, but I'm not sure... so some enlightenment from people would be appreciated. (I understand about latency)
Or is there an easy way to swap bits 0-31 of RAX with bits 32-64 rather than rotating or shifting? Something like what swap did on the 680x0.