Whats wrong with my ARM assembly code for counting the number of 0's in a 32 bit register

Question

test:
       mov r1,#32

loop:
       cmp r0, #0

       beq done
       mov r3, r0 
       lsr r0, r0, #1
       cmp r0, r3
       blt sub
       b done
sub:

      sub r1, r1, #1
      b loop
done:

       mov r0, r1
       mov  pc, lr

I have it set up so it decrements whenever there is a one present, but it doesnt quite work and I don't know why

It's unclear what your algorithm is. I don't see how `blt` helps you check for a `1` bit. I suggest you simply do `and r3, r0, #1; sub r1, r1, r3; lsr r0, r0, #1; b loop`. — Jester, Nov 21 '18 at 20:23
@Jester: `blt` could check the sign bit after a *left*-shift, like `add r0, r0, r0` / `cmp`/ `addlt r1, r1, #1`. Or better, left-shift and set flags with `adds` / `addmi` to increment a counter when the bit is set. (Or I forget the condition name for non-negative, the opposite of MInus, which you could use with a conditional add to count zeros.) Or maybe you do need to subtract from 32 if you want to stop the loop when you've shifted out all the bits. — Peter Cordes, Nov 21 '18 at 20:27
I think you can count zeros in a sneaky, efficient way: shift your register one way or the other (doesn't matter); increment a counter if CARRY is set, and then leave the loop when your register is zero. This counts the number of `1` bits, stopping when they've all been shifted out and counted. Then you subtract the number of ones from 32 to get your answer. — Dave M., Nov 21 '18 at 20:36
@DaveM Agree this is a good way to do it, but wouldn't call it "sneaky" at all. — TypeIA, Nov 21 '18 at 20:57

score 1 · Answer 1 · answered Nov 21 '18 at 20:53

Your design idea is somewhat overcomplicated, which made it harder for you to get the code right. I'm not sure exactly why you thought (x>>1) < x (signed compare after unsigned right shift) was useful.

You can take advantage of flags to get information about the top bit, but you don't need a cmp do to so. Use a left-shift (or add same,same) that sets flags, and test the S flag using the MInus condition to find out what the high bit of the result was.

Or look at the C flag to see the bit shifted out, but then you'd need to do something with the C flag after the last iteration (after the register becomes zero). That's fine, you can peel out that last iteration.

Using a right shift (your lsr) can't work if you're using conditions that depend on the sign bit.

test:
       movs  r1, r0            @ copy and set flags
       mov   r0, #32

          @ loop invariants:
          @ r0 = return value
          @ r1 = input
          @ flags set according to the current value of r1
.loop:                         @ do {
      submi r0, r0, #1    @ predicated subtract: if(high_bit_set(r1)) r0--;
      adds  r1, r1        @ left-shift by 1 and set flags
      bne  .loop          @ keep looping until there are no set bits
                               @ }while(r1<<=1);

      mov  pc, lr        @ or bx lr

Instead of branching, you definitely want to take advantage of ARM's predicated execution of any instruction, but appending a condition to the mnemonic. submi is a sub which is a no-op if the MI condition is false.

Of course if you care about performance, an 8-bit lookup table can be a good way to implement popcnt, or there's a bithack formula that ARM can probably do very efficiently with its barrel shifter. How to count the number of set bits in a 32-bit integer?

AFAIK, ARM doesn't have a hardware bit-count instruction like some other architectures do, e.g. x86's popcnt.

In computer programs, small numbers are usually common. Left-shifting will take ~30 iterations to shift out all the bits for numbers with any low bits set. But right-shifting can finish in a few iterations for small numbers like 7 (only the low 3 bits set).

If it's common for your inputs to have some contiguous high bits all cleared, then the left-shifting loop I wrote for this answer is the worst.

Great answer. Just to note that `ROR` and `RRX` *do* (or can) affect the carry flag, so it's not completely true that using a right shift can't work if your conditions depend on the carry flag, though admittedly they're not much use in this context. (That said, maybe an `RRX` with the rest of the code crafted such that the carry flag is guaranteed zero on entry to the `RRX` could be made to work...) — cooperised, Nov 21 '18 at 22:25
@cooperised: yeah, rotate is interesting, but then the register will never become zero. Anyway, note that I said LSR can't work if you're looking at the *sign* bit / flag. The OP is using `cmp` after the shift, destroying any `C`. So yes, `LSR` to shift the low bit into carry would be good, with a final `add` or `sub` after the loop exit. (Like `subc r0, r0, #1` / `lsrs r1,r1,#1` / `bne`, then `subc r0,r0,#1` / ret. Or whatever the right name is for a condition that's true if C is set. I know ARM is different than x86 for some condition names.) — Peter Cordes, Nov 21 '18 at 22:35
Agreed. The register will become zero using `RRX` if you ensure that the carry flag is clear before each use though. Not the most readable code, perhaps, but feasible :-D — cooperised, Nov 22 '18 at 10:05

Whats wrong with my ARM assembly code for counting the number of 0's in a 32 bit register

1 Answers1