0

I want to make sure if a number in register is a multiplication of 3 using avr studio and asm language but on avr 8515 so there no div syntax

i already tried a couple method like adding the register and 0b0011 expecting the carry flag would be set, but it isn't

  • Hardware division is a slow way to test for divisibility by 3 anyway. See [Fast divisibility tests (by 2,3,4,5,.., 16)?](https://stackoverflow.com/q/6896533) for multiplicative constants. Hopefully a narrower version of that is possible with smaller constants for 16-bit. For 8-bit a lookup table is possible if you need max speed on AVR, otherwise multiply (possibly manually with shift-and-add.) – Peter Cordes Apr 20 '23 at 14:59
  • So you want to multiply with 3? Or divide by 3? I am confised, maybe cou can be more clear about what you want to achieve, and for which data types. – emacs drives me nuts Apr 20 '23 at 16:10
  • Please provide enough code so others can better understand or reproduce the problem. – Community Apr 21 '23 at 16:57

2 Answers2

2

So a straight forward way is to let avr-gcc compile a piece of C code, and then peek at the assembly1 it generates:

For an 8-bit, unsigned value consider the following C99:

#include <stdint.h>
#include <stdbool.h>

bool is_udiv3_8 (uint8_t x)
{
    return x % 3 == 0;
}

With something like

> avr-gcc x.c -mmcu=atmega8515 -O2 -S

we get x.s (alternatively, add -save-temps to an ordinary compilation and then intercept the *.s file).

is_udiv3_8:
    ldi r25,lo8(-85) ;1
    mul r24,r25      ;2
    mov r25,r1       ;3
    clr __zero_reg__ ;4
    lsr r25          ;5
    mov r18,r25      ;6
    lsl r18          ;7
    add r25,r18      ;8
    ldi r18,lo8(1)   ;9
    cpse r24,r25     ;10
    ldi r18,0        ;11
    mov r24,r18      ;12
    ret

What avr-gcc does is to multiply the input with -256/3 and takes the high byte of the product as the quotient after dividing by 3. After some adjustments, it returns in R24 True (1) if the input was divisible by 3 and False (0), otherwise.

You can extend this to 16-bit values, but you'll need the high word of a 16×16=32 multiplication.

At that point you remember2 that a natural number N written in base B is divisible by B+1 iff the alternating cross sum over the digits in base B is divisible by B+1.

For example in base B=2: A natural number N is divisible by 3 iff the alternating cross sum of the bits of N is divisible by 3.

Written in assembly:

  • First a loop over the bits to get the binary cross sum. For speed you would unroll that loop which would get 8 sbrss and 8 sbrcs.

  • The cross sum q satisfies -8 <= q <= 8. Negating does not change whether q is divisible by 3, thus continue with |q| which is non-negative.

  • Subtract 3 until the result is 0, 1 or 2.

  • Return True if the value reached 0, and False otherwise (it reached 1 or 2).

.text
.global is_div3_asm

is_div3_asm:
    ;; R26 holds R25:R24 mod 3
    clr r26
.Loop_bits:
    ;; Loop over all bits of R25:R24 to compute the alternating cross sum
    ;; over the binary digits of R25:R24.
    sbiw r24, 0
    breq .Loop_done
    sbrc r24, 0 $ inc  r26
    sbrc r24, 1 $ dec  r26
    lsr  r25 $ ror r24
    lsr  r25 $ ror r24
    rjmp .Loop_bits
.Loop_done:
    ;; r26 = abs(r26)
    sbrc r26, 7
    neg  r26
    ;; Now we have 0 <= r26 <= 8, so reduce to r26 < 3...
    cpi  r26, 3
    brlt .Ltobool
    subi r26, 3
    ;; ...now we have 0 <= r26 <= 5, so at most one more sub 3 will do.
    cpi  r26, 3
    brlt .Ltobool
    subi r26, 3
.Ltobool:
    ;; Return a bool in R24 as of avr-gcc ABI.
    ldi r24, 1                  ; True
    cpi r26, 0
    breq .Ldone
    ldi r24, 0                  ; False
.Ldone:    
    ret

This function complies to the avr-gcc ABI and calling convention. You can use it from C/C++ by means of prototype

extern bool is_div3_asm (uint16_t); // C
extern "C" bool is_div3_asm (uint16_t); // C++

1Note that there are different assembly dialects. This answer uses the GNU assemlby dialect because it is compatible with the GNU assembler and produced by avr-gcc and avr-g++.

2The result is actually stronger: A natural number N is in the same rest class modulo B+1 like the alternating cross sum of N in base B. The proof is just a few lines of modular arithmetic and boils down to B ≡ -1 mod B+1.

emacs drives me nuts
  • 2,785
  • 13
  • 23
1

This is just a bit of a comment and elaboration on @emacs drives me nuts' answer but is too long and complex to be a Stackoverflow comment, therefore this is a Community wiki Stackoverflow answer.

I have compiled 8, 16, 24, and 32 bit versions on godbolt.org with different versions of avr-gcc (12.2.0 and 5.4.0) with the same compile options (-std=c99 -O2 -g -mmcu=atmega8515). It turns out the code produced by avr-gcc 12.2.0 is significantly shorter than what avr-gcc 5.4.0 produces.

As @emacs drives me nuts' answer works off the relatively complicated code from avr-gcc 5.4.0, it may be useful to take a look at the simpler code from 12.2.0 as well.

The uint8_t variant from avr-gcc 5.4.0 appears to be what @emacs drives me nuts' answer works on:

is_udiv8_by_3:
        ldi r25,lo8(-85)  /* 1 */
        mul r24,r25       /* 2 */
        mov r25,r1        /* 1 */
        clr __zero_reg__  /* 1 */
        lsr r25           /* 1 */
        mov r18,r25       /* 1 */
        lsl r18           /* 1 */
        add r25,r18       /* 1 */
        ldi r18,lo8(1)    /* 1 */
        cpse r24,r25      /* 1 */
        ldi r18,0         /* 1 */
        mov r24,r18       /* 1 */
        ret               /* 13 cycles plus ret */

uint8_t variant from avr-gcc 12.2.0:

is_udiv8_by_3:
        ldi r25,lo8(-85)  /* 1 */
        mul r24,r25       /* 2 */
        mov r25,r0        /* 1 */
        clr r1            /* 1 */
        ldi r24,lo8(1)    /* 1 */
        cpi r25,lo8(86)   /* 1 */
        brlo .L2          /* 2/1 */
        ldi r24,0         /* -/1 */
.L2:
        ret               /* 9 cycles plus ret */

The uint16_t variant from avr-gcc 12.2.0 is not significantly longer than the uint8_t variant from avr-gcc 5.4.0:

is_udiv16_by_3:
        ldi r20,lo8(-85)  /* 1 */
        ldi r21,lo8(-86)  /* 1 */
        mul r24,r20       /* 2 */
        movw r18,r0       /* 2 */
        mul r24,r21       /* 2 */
        add r19,r0        /* 1 */
        mul r25,r20       /* 2 */
        add r19,r0        /* 1 */
        clr r1            /* 1 */
        ldi r24,lo8(1)    /* 1 */
        cpi r18,86        /* 1 */
        sbci r19,85       /* 1 */
        brlo .L5          /* 2/1 */
        ldi r24,0         /* -/1 */
.L5:
        ret               /* 18 cycles plus ret */

BTW, the cycle count of the is_div3_asm function from @emacs drives me nuts' answer depends on the input value, and goes over 18 cycles even for a single iteration of the .Loop_bits loop.

When increasing the type size to 24bit (the __uint24 variant) and 32bit, avr-gcc 12.2.0 finally starts calls a division function __udivmodpsi4:

is_udiv24_by_3:
        ldi r18,lo8(3)
        ldi r19,0
        ldi r20,0
        rcall __udivmodpsi4
        ldi r24,lo8(1)
        or r18,r19
        or r18,r20
        breq .L7
        ldi r24,0
.L7:
        ret

The uint32_t calls a different division function __udivmodsi4 and is a lot longer as well:

is_udiv32_by_3:
        push r28
        push r29
        rcall .
        rcall .
        in r28,__SP_L__
        in r29,__SP_H__
        ldi r18,lo8(3)
        ldi r19,0
        ldi r20,0
        ldi r21,0
        rcall __udivmodsi4
        std Y+1,r22
        std Y+2,r23
        std Y+3,r24
        std Y+4,r25
        ldi r24,lo8(1)
        ldd r18,Y+1
        ldd r19,Y+2
        ldd r20,Y+3
        ldd r21,Y+4
        or r18,r19
        or r18,r20
        or r18,r21
        breq .L12
        ldi r24,0
.L12:
        pop __tmp_reg__
        pop __tmp_reg__
        pop __tmp_reg__
        pop __tmp_reg__
        pop r29
        pop r28
        ret

So looking for other algorithms like the alternate cross sum algorithm from @emacs drives me nuts' answer or the Fast divisibility tests (by 2,3,4,5,.., 16)? @Peter Cordes links to look interesting for integers sizes above 16 bits only.

Code sizes and loop cycles would need to be considered more carefully.

ndim
  • 35,870
  • 12
  • 47
  • 57
  • 1
    GCC5.4 predates Cassio Neri's improved algorithm ([Fast divisibility tests (by 2,3,4,5,.., 16)?](https://stackoverflow.com/a/49264279)) which made it into GCC9 in oct 2018. – Peter Cordes Apr 24 '23 at 14:11
  • Notice that for code size, you also have to take into account the code that's dragged from libgcc like `__udivmodsi4` and all of its dependencies. – emacs drives me nuts Apr 27 '23 at 11:17
  • And also notice, that for v9 up to including v12.2, code size and execution time might be bloated due to [PR90706](https://gcc.gnu.org/PR90706). For example, for `is_udiv24_by_3` with `-O2 -mmcu=atmega8515`, v11.3 reports 64 bytes while v8 and v14 (master) will report 26 bytes. Stack usage and register pressure is also higher due to PR90706. – emacs drives me nuts Apr 27 '23 at 11:28