Your design idea is somewhat overcomplicated, which made it harder for you to get the code right. I'm not sure exactly why you thought (x>>1) < x (signed compare after unsigned right shift) was useful.
You can take advantage of flags to get information about the top bit, but you don't need a cmp do to so. Use a left-shift (or add same,same) that sets flags, and test the S flag using the MInus condition to find out what the high bit of the result was.
Or look at the C flag to see the bit shifted out, but then you'd need to do something with the C flag after the last iteration (after the register becomes zero). That's fine, you can peel out that last iteration.
Using a right shift (your lsr) can't work if you're using conditions that depend on the sign bit.
test:
movs r1, r0 @ copy and set flags
mov r0, #32
@ loop invariants:
@ r0 = return value
@ r1 = input
@ flags set according to the current value of r1
.loop: @ do {
submi r0, r0, #1 @ predicated subtract: if(high_bit_set(r1)) r0--;
adds r1, r1 @ left-shift by 1 and set flags
bne .loop @ keep looping until there are no set bits
@ }while(r1<<=1);
mov pc, lr @ or bx lr
Instead of branching, you definitely want to take advantage of ARM's predicated execution of any instruction, but appending a condition to the mnemonic. submi is a sub which is a no-op if the MI condition is false.
Of course if you care about performance, an 8-bit lookup table can be a good way to implement popcnt, or there's a bithack formula that ARM can probably do very efficiently with its barrel shifter. How to count the number of set bits in a 32-bit integer?
AFAIK, ARM doesn't have a hardware bit-count instruction like some other architectures do, e.g. x86's popcnt.
In computer programs, small numbers are usually common. Left-shifting will take ~30 iterations to shift out all the bits for numbers with any low bits set. But right-shifting can finish in a few iterations for small numbers like 7 (only the low 3 bits set).
If it's common for your inputs to have some contiguous high bits all cleared, then the left-shifting loop I wrote for this answer is the worst.