Make sure if a value in register is a multiplication of 3 in AVR Studio using assembly

Question

I want to make sure if a number in register is a multiplication of 3 using avr studio and asm language but on avr 8515 so there no div syntax

i already tried a couple method like adding the register and 0b0011 expecting the carry flag would be set, but it isn't

Hardware division is a slow way to test for divisibility by 3 anyway. See [Fast divisibility tests (by 2,3,4,5,.., 16)?](https://stackoverflow.com/q/6896533) for multiplicative constants. Hopefully a narrower version of that is possible with smaller constants for 16-bit. For 8-bit a lookup table is possible if you need max speed on AVR, otherwise multiply (possibly manually with shift-and-add.) — Peter Cordes, Apr 20 '23 at 14:59
So you want to multiply with 3? Or divide by 3? I am confised, maybe cou can be more clear about what you want to achieve, and for which data types. — emacs drives me nuts, Apr 20 '23 at 16:10
Please provide enough code so others can better understand or reproduce the problem. — Community, Apr 21 '23 at 16:57

emacs drives me nuts · Answer 1 · 2023-04-20T18:15:12.860

So a straight forward way is to let avr-gcc compile a piece of C code, and then peek at the assembly¹ it generates:

For an 8-bit, unsigned value consider the following C99:

#include <stdint.h>
#include <stdbool.h>

bool is_udiv3_8 (uint8_t x)
{
    return x % 3 == 0;
}

With something like

> avr-gcc x.c -mmcu=atmega8515 -O2 -S

we get x.s (alternatively, add -save-temps to an ordinary compilation and then intercept the *.s file).

is_udiv3_8:
    ldi r25,lo8(-85) ;1
    mul r24,r25      ;2
    mov r25,r1       ;3
    clr __zero_reg__ ;4
    lsr r25          ;5
    mov r18,r25      ;6
    lsl r18          ;7
    add r25,r18      ;8
    ldi r18,lo8(1)   ;9
    cpse r24,r25     ;10
    ldi r18,0        ;11
    mov r24,r18      ;12
    ret

What avr-gcc does is to multiply the input with -256/3 and takes the high byte of the product as the quotient after dividing by 3. After some adjustments, it returns in R24 True (1) if the input was divisible by 3 and False (0), otherwise.

You can extend this to 16-bit values, but you'll need the high word of a 16×16=32 multiplication.

At that point you remember² that a natural number N written in base B is divisible by B+1 iff the alternating cross sum over the digits in base B is divisible by B+1.

For example in base B=2: A natural number N is divisible by 3 iff the alternating cross sum of the bits of N is divisible by 3.

Written in assembly:

First a loop over the bits to get the binary cross sum. For speed you would unroll that loop which would get 8 sbrss and 8 sbrcs.
The cross sum q satisfies -8 <= q <= 8. Negating does not change whether q is divisible by 3, thus continue with |q| which is non-negative.
Subtract 3 until the result is 0, 1 or 2.
Return True if the value reached 0, and False otherwise (it reached 1 or 2).

.text
.global is_div3_asm

is_div3_asm:
    ;; R26 holds R25:R24 mod 3
    clr r26
.Loop_bits:
    ;; Loop over all bits of R25:R24 to compute the alternating cross sum
    ;; over the binary digits of R25:R24.
    sbiw r24, 0
    breq .Loop_done
    sbrc r24, 0 $ inc  r26
    sbrc r24, 1 $ dec  r26
    lsr  r25 $ ror r24
    lsr  r25 $ ror r24
    rjmp .Loop_bits
.Loop_done:
    ;; r26 = abs(r26)
    sbrc r26, 7
    neg  r26
    ;; Now we have 0 <= r26 <= 8, so reduce to r26 < 3...
    cpi  r26, 3
    brlt .Ltobool
    subi r26, 3
    ;; ...now we have 0 <= r26 <= 5, so at most one more sub 3 will do.
    cpi  r26, 3
    brlt .Ltobool
    subi r26, 3
.Ltobool:
    ;; Return a bool in R24 as of avr-gcc ABI.
    ldi r24, 1                  ; True
    cpi r26, 0
    breq .Ldone
    ldi r24, 0                  ; False
.Ldone:    
    ret

This function complies to the avr-gcc ABI and calling convention. You can use it from C/C++ by means of prototype

extern bool is_div3_asm (uint16_t); // C
extern "C" bool is_div3_asm (uint16_t); // C++

¹Note that there are different assembly dialects. This answer uses the GNU assemlby dialect because it is compatible with the GNU assembler and produced by avr-gcc and avr-g++.

²The result is actually stronger: A natural number N is in the same rest class modulo B+1 like the alternating cross sum of N in base B. The proof is just a few lines of modular arithmetic and boils down to B ≡ -1 mod B+1.

ndim · Answer 2 · 2023-04-24T20:34:00.497

This is just a bit of a comment and elaboration on @emacs drives me nuts' answer but is too long and complex to be a Stackoverflow comment, therefore this is a Community wiki Stackoverflow answer.

I have compiled 8, 16, 24, and 32 bit versions on godbolt.org with different versions of avr-gcc (12.2.0 and 5.4.0) with the same compile options (-std=c99 -O2 -g -mmcu=atmega8515). It turns out the code produced by avr-gcc 12.2.0 is significantly shorter than what avr-gcc 5.4.0 produces.

As @emacs drives me nuts' answer works off the relatively complicated code from avr-gcc 5.4.0, it may be useful to take a look at the simpler code from 12.2.0 as well.

The uint8_t variant from avr-gcc 5.4.0 appears to be what @emacs drives me nuts' answer works on:

is_udiv8_by_3:
        ldi r25,lo8(-85)  /* 1 */
        mul r24,r25       /* 2 */
        mov r25,r1        /* 1 */
        clr __zero_reg__  /* 1 */
        lsr r25           /* 1 */
        mov r18,r25       /* 1 */
        lsl r18           /* 1 */
        add r25,r18       /* 1 */
        ldi r18,lo8(1)    /* 1 */
        cpse r24,r25      /* 1 */
        ldi r18,0         /* 1 */
        mov r24,r18       /* 1 */
        ret               /* 13 cycles plus ret */

uint8_t variant from avr-gcc 12.2.0:

is_udiv8_by_3:
        ldi r25,lo8(-85)  /* 1 */
        mul r24,r25       /* 2 */
        mov r25,r0        /* 1 */
        clr r1            /* 1 */
        ldi r24,lo8(1)    /* 1 */
        cpi r25,lo8(86)   /* 1 */
        brlo .L2          /* 2/1 */
        ldi r24,0         /* -/1 */
.L2:
        ret               /* 9 cycles plus ret */

The uint16_t variant from avr-gcc 12.2.0 is not significantly longer than the uint8_t variant from avr-gcc 5.4.0:

is_udiv16_by_3:
        ldi r20,lo8(-85)  /* 1 */
        ldi r21,lo8(-86)  /* 1 */
        mul r24,r20       /* 2 */
        movw r18,r0       /* 2 */
        mul r24,r21       /* 2 */
        add r19,r0        /* 1 */
        mul r25,r20       /* 2 */
        add r19,r0        /* 1 */
        clr r1            /* 1 */
        ldi r24,lo8(1)    /* 1 */
        cpi r18,86        /* 1 */
        sbci r19,85       /* 1 */
        brlo .L5          /* 2/1 */
        ldi r24,0         /* -/1 */
.L5:
        ret               /* 18 cycles plus ret */

BTW, the cycle count of the is_div3_asm function from @emacs drives me nuts' answer depends on the input value, and goes over 18 cycles even for a single iteration of the .Loop_bits loop.

When increasing the type size to 24bit (the __uint24 variant) and 32bit, avr-gcc 12.2.0 finally starts calls a division function __udivmodpsi4:

is_udiv24_by_3:
        ldi r18,lo8(3)
        ldi r19,0
        ldi r20,0
        rcall __udivmodpsi4
        ldi r24,lo8(1)
        or r18,r19
        or r18,r20
        breq .L7
        ldi r24,0
.L7:
        ret

The uint32_t calls a different division function __udivmodsi4 and is a lot longer as well:

is_udiv32_by_3:
        push r28
        push r29
        rcall .
        rcall .
        in r28,__SP_L__
        in r29,__SP_H__
        ldi r18,lo8(3)
        ldi r19,0
        ldi r20,0
        ldi r21,0
        rcall __udivmodsi4
        std Y+1,r22
        std Y+2,r23
        std Y+3,r24
        std Y+4,r25
        ldi r24,lo8(1)
        ldd r18,Y+1
        ldd r19,Y+2
        ldd r20,Y+3
        ldd r21,Y+4
        or r18,r19
        or r18,r20
        or r18,r21
        breq .L12
        ldi r24,0
.L12:
        pop __tmp_reg__
        pop __tmp_reg__
        pop __tmp_reg__
        pop __tmp_reg__
        pop r29
        pop r28
        ret

So looking for other algorithms like the alternate cross sum algorithm from @emacs drives me nuts' answer or the Fast divisibility tests (by 2,3,4,5,.., 16)? @Peter Cordes links to look interesting for integers sizes above 16 bits only.

Code sizes and loop cycles would need to be considered more carefully.

GCC5.4 predates Cassio Neri's improved algorithm ([Fast divisibility tests (by 2,3,4,5,.., 16)?](https://stackoverflow.com/a/49264279)) which made it into GCC9 in oct 2018. — Peter Cordes, Apr 24 '23 at 14:11
Notice that for code size, you also have to take into account the code that's dragged from libgcc like `__udivmodsi4` and all of its dependencies. — emacs drives me nuts, Apr 27 '23 at 11:17
And also notice, that for v9 up to including v12.2, code size and execution time might be bloated due to [PR90706](https://gcc.gnu.org/PR90706). For example, for `is_udiv24_by_3` with `-O2 -mmcu=atmega8515`, v11.3 reports 64 bytes while v8 and v14 (master) will report 26 bytes. Stack usage and register pressure is also higher due to PR90706. — emacs drives me nuts, Apr 27 '23 at 11:28

Make sure if a value in register is a multiplication of 3 in AVR Studio using assembly

2 Answers2