You cannot move the upper bits of an XMM register into a general purpose register directly.
You'll have to follow a two-step process, which may or may not involve a roundtrip to memory or the destruction of a register.
in registers (SSE2)
movq rax,xmm0 ;lower 64 bits
movhlps xmm0,xmm0 ;move high 64 bits to low 64 bits.
movq rbx,xmm0 ;high 64 bits.
punpckhqdq xmm0,xmm0 is the SSE2 integer equivalent of movhlps xmm0,xmm0. Some CPUs may avoid a cycle or two of bypass latency if xmm0 was last written by an integer instruction, not FP.
via memory (SSE2)
movdqu [mem],xmm0
mov rax,[mem]
mov rbx,[mem+8]
slow, but does not destroy xmm register (SSE4.1)
mov rax,xmm0
pextrq rbx,xmm0,1 ;3 cycle latency on Ryzen! (and 2 uops)
A hybrid strategy is possible, e.g. store to memory, movd/q e/rax,xmm0 so it's ready quickly, then reload the higher elements. (Store-forwarding latency is not much worse than ALU, though.) That gives you a balance of uops for different back-end execution units. Store/reload is especially good when you want lots of small elements. (mov / movzx loads into 32-bit registers are cheap and have 2/clock throughput.)
For 32 bits, the code is similar:
in registers
movd eax,xmm0
psrldq xmm0,xmm0,4 ;shift 4 bytes to the right
movd ebx,xmm0
psrldq xmm0,xmm0,4 ; pshufd could copy-and-shuffle the original reg
movd ecx,xmm0 ; not destroying the XMM and maybe creating some ILP
psrlq xmm0,xmm0,4
movd edx,xmm0
via memory
movdqu [mem],xmm0
mov eax,[mem]
mov ebx,[mem+4]
mov ecx,[mem+8]
mov edx,[mem+12]
Not destroying xmm register (SSE4.1) (slow like the psrldq / pshufd version)
movd eax,xmm0
pextrd ebx,xmm0,1 ;3 cycle latency on Skylake!
pextrd ecx,xmm0,2 ;also 2 uops: like a shuffle(port5) + movd(port0)
pextrd edx,xmm0,3
The 64-bit shift variant can run in 2 cycles. The pextrq version takes 4 minimum. For 32-bit, the numbers are 4 and 10, respectively.