I have an integer value of -1 and want to load it as fast as possible into all 8 slots of a _m256 register like ymm0.
I didnt find an assembly instruction. MASM doesnt accept
vmovaps ymm1, 0FFFFFFFFh ; -1
When using intrinsics like
// get constant values into sse register
__m256 tmp = _mm256_set1_ps(rp->xc);
The generated code in visual studio looks like:
mov rax,qword ptr [rp]
vmovss xmm0,dword ptr [rax+34h]
vshufps xmm0,xmm0,xmm0,0
vinsertf128 ymm0,ymm0,xmm0,1
vmovups ymmword ptr [rbp+7C0h],ymm0
vmovups ymm0,ymmword ptr [rbp+7C0h]
vmovups ymmword ptr [tmp],ymm0
This is a little long for a rather simple thing that happens all the time. I still hope there is a direct instruction that does this. I am looking for assembler (using intrinsics just to see what the compiler does).
I am aware that i must somehow specify that all 8 slots in the _m256 get the same value.
So far my only idea is to pass the constant (-1) in rdx. Then load rdx into ymm1 and then do some shuffling. I just think i do something wrong , because, again loading a constant value (or a single float/int) to all slots of a avx register should be a very common task. So i cant believe that there is no dedicated instruction for this.