Trying to do multiplication in x86 assembly, using rax, rdi, rsi, and rcx registers

Question

I am new to x86 assembly. I am trying to write a function for multiplication, using the rax, rdi, rsi, and rcx registers. However, I have noticed that it is not working. For 2 * 3, if I increment the right operand by 1, the result goes up by 8. And to begin with, 2 * 3 yielded 26. There is probably something to do with counting a byte wrong here. However, apart from that, I feel like I am in the dark. Does anyone understand what is wrong with my code? This is how I am assembling it: gcc -O0 -g -mstackrealign -masm=intel -o test test.asm

    .global _main
    .text
end:
    mov rax, rdi
    ret
plus:
    add rdi, rsi
    jmp end
minus:
    sub rdi, rsi
    jmp end
multiply:
    add rdi, rcx
    sub rsi, 1
    cmp rsi, 0
    je end
    jmp multiply
eq:
    cmp rdi, rsi
    je true
    jmp false
    true:
        mov rax, 1
        ret
    false:
        mov rax, 0
        ret
_main:
    push 2
    push 4
    pop rsi
    pop rdi
    mov rcx, rsi
    call multiply
    push rax
    mov rdi, rax
    mov rax, 0x2000001
    syscall

That seems totally pointless. x86-64 CPUs have fast `imul`, just do `imul rdi, rsi` like a normal person (or a compiler). Also, using an extra `jmp` to a common tail to just save 2 bytes per function doesn't seem worth it. Especially not if you're going to write inefficient code like that multiply loop that wastes a bunch of instructions. (e.g. `cmp rsi, 0` after `sub` already set FLAGS, and an `eq` that ignores `setcc`) — Peter Cordes, Sep 24 '20 at 03:57
@fuz: IIRC, Caspian has been posting questions about a building a Forth compiler or JIT or something as a hobby project. In that context, defining a multiply function this way is totally pointless and making it harder for himself. Of course, making separate functions at all is super weird instead of just inlining `sub` or `imul`. OTOH, getting the logic correct in a simple loop is not a bad exercise whether someone's making you solve it or whether you choose to do so. (And designing the calling convention to not force the caller to pass one arg twice would also be good...) — Peter Cordes, Sep 24 '20 at 07:19
@PeterCordes I was doing this as a last-ditch effort for multiplication to work. My goal is to multiply the contents of rdi and rsi, and put the result in rax. This hasn't been working. Once the result starts getting bigger (around 600), the result is incorrect. I do not know why - do you have any idea why this could be? — Caspian Ahlberg, Sep 24 '20 at 11:44
What have you observed when single-stepping with a debugger? (And you say "last-ditch effort" as if you weren't able to get it to work using the obvious `imul` instruction - it might be more productive to focus on getting that working instead, which you could certainly ask questions about here.) — Nate Eldredge, Sep 24 '20 at 14:01
@NateEldredge imul is the instruction that wasn't working. I've already asked about that - but I didn't get any helpful responses — Caspian Ahlberg, Sep 24 '20 at 14:13
Can you give a link to that question? I don't see it in your profile history. — Nate Eldredge, Sep 24 '20 at 14:39
I think you have the registers mixed up. It sounds like the goal is to add `rdi` to itself, `rsi`-many times, but the value you add every time is `rcx`, and that contains the original value of `rsi`, not of `rdi`. There is also an off-by-one error in that you really only want to do a total of `rsi-1` adds, if you think about some small examples. — Nate Eldredge, Sep 24 '20 at 14:48
Just to reiterate - single-stepping in your debugger is the way to understand what's going on here. By watching the registers after each instruction, it should be much easier to understand what is wrong and why. You mentioned gdb in another question, but only in the context of looking at a backtrace after a crash - that's too late. If it isn't something you're already comfortable doing, my advice would be not to write a single more line of code until you have gotten some practice debugging. — Nate Eldredge, Sep 24 '20 at 14:50
Oh, and when you fix your off-by-one error, make sure multiplication by zero (in either place) works correctly. — Nate Eldredge, Sep 24 '20 at 14:52
@NateEldredge I did some changes. I checked with GDB and it seems that the result is ending up in rax, and the number is correct! But the end part is really tricky - especially because GDB won't let me set breakpoints at line numbers, no matter what -g flags I give it. If you look at the last three lines above, something has to be messing up there - do you know what could be happening? — Caspian Ahlberg, Sep 24 '20 at 17:10
@NateEldredge I see the problem! I tested different exit codes, and it turns out that the biggest one possible is 255. So when I do "mov rdi, rax", when the result is over 255 it fails. Do you have any idea how to have arbitrarily sized exit codes? — Caspian Ahlberg, Sep 24 '20 at 17:21
@CaspianAhlberg: If `mov rax, rdi` / `imul rax, rsi` wasn't working, you're probably printing your result wrong. Are you using the produce at an exit status for the process? That's only an 8-bit number. Lol, commented simultaneously with yours. — Peter Cordes, Sep 24 '20 at 17:22
@PeterCordes See my response to Nate. Do you know a reasonable way to print out that number to the screen, or to produce a bigger exit code? I don't really care about how I show the result, just that the result is outputted in some way. — Caspian Ahlberg, Sep 24 '20 at 17:24
On Linux (and maybe POSIX in general) a caller can retrieve the full 32-bit number passed to `_exit exit status with `waitid` ([Return value range of the main function](https://stackoverflow.com/a/5149399)), but shells don't do that because POSIX defines the exit status as 8-bit. Or on Linux as a hack you can `strace ./my_process` to see the arg it passes to the system call, before it gets truncated to an 8-bit exit status. I assume the MacOS equivalent would work, `truss` or `dtrace` or something I think it's called? — Peter Cordes, Sep 24 '20 at 17:27
@CaspianAhlberg: Of course you could just look at it with a debugger to make sure it works. Or yes, print it by calling printf, or convert to a string of ASCII digits yourself and make a `write` system call. [How to convert a binary integer number to a hex string?](https://stackoverflow.com/q/53823756) or [How do I print an integer in Assembly Level Programming without printf from the c library?](https://stackoverflow.com/a/46301894) — Peter Cordes, Sep 24 '20 at 17:28
You might like to ask separate questions about using GDB. A good way to set breakpoints is to first use `disassemble` to see the instructions with their addresses, and then `break *0x12345678` to set a breakpoint at the appropriate address. You can also set breakpoints at labels by name (`break multiply`), and you could add extra labels in the source code for that purpose. For small functions it's usually just as easy to break somewhere shortly before the code in question, and then single-step from then on (`si, ni`). — Nate Eldredge, Sep 24 '20 at 17:40
You can see the contents of all the registers with `info registers`, or specific ones with commands like `print $rsi`. — Nate Eldredge, Sep 24 '20 at 17:41
@PeterCordes It's tricky printing anything out - _puts doesn't work for me - but I'll figure it out eventually — Caspian Ahlberg, Sep 24 '20 at 19:26
@CaspianAhlberg: `_puts` takes a pointer to a zero-terminated C string. If you're going to call a libc function, call `_printf` with a `"%ld\n"` format string. `puts` would only be useful for numbers if you already formatted them into ASCII strings, at which point you might as well just make a `write` system call directly. — Peter Cordes, Sep 24 '20 at 19:40
See also the GDB tips at the bottom of https://stackoverflow.com/tags/x86/info, e.g. `layout reg` to watch registers change as you single-step. — Peter Cordes, Sep 24 '20 at 19:41

Trying to do multiplication in x86 assembly, using rax, rdi, rsi, and rcx registers

0 Answers0