Set XMM register via address location for X86-64

Question

I have a float value at some address in memory, and I want to set an XMM register to that value by using the address. I'm using asmjit.

This code works for a 32 bit build and sets the XMM register v to the correct value *f:

using namespace asmjit;
using namespace x86;

void setXmmVarViaAddressLocation(X86Compiler& cc, X86Xmm& v, const float* f)
{
   cc.movq(v, X86Mem(reinterpret_cast<std::uintptr_t>(f)));
}

When I compile in 64 bits, though, I get a segfault when trying to use the register. Why is that?

(And yes, I am not very strong in assembly... Be kind... I've been on this for a day now...)

What actual machine code does that produce? Did it perhaps truncate a 64-bit address to fit in a `[disp32]` absolute addressing mode? x86-64 can't use arbitrary 64-bit addresses as absolute direct memory operands, except for mov to/from RAX/EAX/AX/AL. Normally you want to use RIP-relative addressing for static data with 2GiB of code. — Peter Cordes, Nov 22 '21 at 08:31
Hmm... No idea how to see the actual machine code. This is the error shown by VS: `Access violation executing location 0x0000000000000000`. I tried to MOV the address to RAX first - but that's not what you mean, right? You mean I should try to first MOV the actual value to RAX, and then to XMM from there? — Duke, Nov 22 '21 at 08:44
Woah, just tried it, and it works!!! Have to check a bit more, but if that's it, it's genius! 1000 kisses! — Duke, Nov 22 '21 at 08:48
I don't know AsmJIT, so I'd rather not post a sub-optimal answer or one that just guesses at exactly what happened. No, moving the *data* to RAX first wasn't what I meant, that would be less efficient if you want the data in XMM0 eventually. If you did need to use a 64-bit absolute address instead of a normal RIP-relative with data with 2GiB of code (or 32-bit absolute if you can put your data in the low 32 bits, e.g. Linux `mmap(MAP_32BIT)`, you'd want `mov reg, imm64`, using any integer register that's convenient. Then `movq xmm0, [reg]`. — Peter Cordes, Nov 22 '21 at 09:01
Also, your address is `0` after truncating to 32-bit or whatever happened here? So your data happened to be 4GiB aligned? — Peter Cordes, Nov 22 '21 at 09:03
> instead of a normal RIP-relative... What does it mean? I'm not at all sure I cannot use that - I just don't know about it... — Duke, Nov 22 '21 at 09:19
RIP-relative data means "relative to instruction pointer" -- ie baked into your program. If you load data dynamically you probably cannot use this. — Botje, Nov 22 '21 at 09:31
Ah, get it! Thanks! You're right, probably I cannot use that, then. In the end, it's a fixed-size (c-style) float array where the values come from, but stored as a member variable in a class of a library. So I cannot guarantee that the data is not allocated dynamically by some end user... — Duke, Nov 22 '21 at 10:10
I'll try your other suggestion using an immediate, though... Thanks a lot for the suggestions and explanations... That helped a lot! — Duke, Nov 22 '21 at 10:11
I don't know asmjit either, but if you want to load "a" float value from memory, you should probably use `movss` instead of `movq` (the latter moves a quadword, i.e. 64 bits) -- both set the remaining elements to zero. But this is probably not the reason for your segfault. — chtz, Nov 22 '21 at 10:21
@chtz Well, `movss` cannot be used here, at least not directly. Not sure how I would do that. I ended up doing: `auto regster = is64BitBuild() ? rax : eax; cc.mov(regster, X86Mem(reinterpret_cast(f), index)); cc.movd(v, regster);` Maybe not the last version, could probably be more efficient, but it works... — Duke, Nov 22 '21 at 14:49
This seems like an old version of asmjit, which is no longer supported. In general the rule in JIT compilation is - if you don't know whether the address you provided is reachable withing 32-bit signed displacement, then don't use such address - moving the address to register will always work, so I would recommend that. Reachable addresses within the generated code are usually referenced via labels. In addition. AsmJit always returns an error when something bad happens, so always use ErrorHandler - it will save you a lot of time, especially in such cases. — Petr, Nov 26 '21 at 18:03
@Petr You're right, it was old. I updated it. Thanks for the suggestion and your answer. I tried it and it works! — Duke, Nov 27 '21 at 19:58

Petr · Accepted Answer · 2021-11-28T17:28:25.300

2

The simplest solution is to avoid the absolute address in ptr(). The reason is that x86/x86_64 requires a 32-bit displacement, which is not always possible for arbitrary user addresses - the displacement is calculated by using the current instruction pointer and the target address - if the difference is outside a signed 32-bit integer the instruction is not encodable (this is an architecture constraint).

Example code:

using namespace asmjit;

void setXmmVarViaAddressLocation(x86::Compiler& cc, x86::Xmm& v, const float* f)
{
    x86::Gp tmpPtr = cc.newIntPtr("tmpPtr");
    cc.mov(tmpPtr, reinterpret_cast<std::uintptr_t>(f);
    cc.movq(v, x86::ptr(tmpPtr));
}

If you want to optimize this code for 32-bit mode, which doesn't have the problem, you would have to check the target architecture first, something like:

using namespace asmjit;

void setXmmVarViaAddressLocation(x86::Compiler& cc, x86::Xmm& v, const float* f)
{
    // Ideally, abstract this out so the code doesn't repeat.
    x86::Mem m;
    if (cc.is32Bit() || reinterpret_cast<std::uintptr_t>(f) <= 0xFFFFFFFFu) {
        m = x86::ptr(reinterpret_cast<std::uintptr_t>(f));
    }
    else {
        x86::Gp tmpPtr = cc.newIntPtr("tmpPtr");
        cc.mov(tmpPtr, reinterpret_cast<std::uintptr_t>(f);
        m = x86::ptr(tmpPtr);
    }

    // Do the move, now the content of `m` depends on target arch.
    cc.movq(v, x86::ptr(tmpPtr));
}

This way you would save one register in 32-bit mode, which is always precious.

edited Nov 28 '21 at 17:28

answered Nov 26 '21 at 18:21

Petr

750
6
8

x86-64 *can* use 32-bit absolute addresses directly, as well as RIP-relative. Like GAS `.intel_syntax noprefix` mode: `movss xmm0, [RIP + 0x1234]` vs. `movss xmm0, [0x401234]` (1 byte longer). So if your data happens to be in the low 32 but you're JITing into a buffer that isn't, that's an option. Might be useful to check for the address being usable directly one way or another before using the bulky and slower `mov reg, imm64` fallback. (Unless you need the address in a register anyway to loop over an array.) – Peter Cordes Nov 27 '21 at 03:36
Or JIT code that gets an address as a function arg, instead of embedding it, if you have it in the caller. (Not good for constants and stuff, but makes sense for functions that loop over arrays.) – Peter Cordes Nov 27 '21 at 03:38
Yes you can use 32-bit absolute addresses, and in that case you can have one additional branch in the code example adding support for that. However, it's not really convenient to rely on absolute 32-bit addresses in 64-bit code. The best way to avoid a problem like this is to use the temporary register - it will always work regardless of the address and regardless of where the JIT code is relocated. – Petr Nov 27 '21 at 15:16
1

Of course it will always work, but the whole point of JITing is performance, which you're sacrificing if you don't try to make *efficient* asm. `mov reg, imm64` is 10 bytes long on its own, and it's an extra instruction. Even worse, it takes an extra cycle to fetch a uop with an imm64 from the uop cache in SnB-family CPUs, so it can hurt front-end throughput more than you'd otherwise expect. (See Agner Fog's microarch guide; https://agner.org/optimize/) So it's the worst case that you'd like to avoid. – Peter Cordes Nov 27 '21 at 15:30
I have changed the code to use absolute address in case it's a 32-bit address. However, I think that discussing performance of code we haven't seen is pointless. Such code will most likely never be in a loop, so I don't really see a negative impact. Usually, when writing JITs, you want to write the safest code possible, because there is just tons of operating systems and configurations that control security. If your address space is randomized then it can just be tricky to make sure that your code is relocated to the reachable region that can relatively access the passed pointer. – Petr Nov 28 '21 at 17:29

Set XMM register via address location for X86-64

1 Answers1