Otherwise just use scalar code starting from the most-significant chunk for your range check.
On a CPU that uses cmp and a flags register, this is done the following way:
- Compare the highest parts (e.g. highest 32 bits) of the numbers (
cmp instruction)
- If not "Equal", jump to "EndOfCompare" (x86:
jne instruction)
- Compare the next parts (e.g. 32 bits) of the numbers
- If not "Equal", jump to "EndOfCompare"
- ...
- Compare the next parts (e.g. 32 bits) of the numbers
- If not "Equal", jump to "EndOfCompare"
- Compare the lowest parts (e.g. lowest 32 bits) of the numbers
- "EndOfCompare":
At this point the flags register contains information about the order (a<b, a=b or a>b) of the two large numbers just like you did a simple cmp instruction comparing two small numbers.
Unfortunately this simple variant will only work with unsigned numbers.
BTW, usually a block is "aligned" in IP address space so you only need to check mask away the low bits and compare the high bits for equality.
A check for (A AND MASK) = (B AND MASK) can be done the following way on a 32-bit CPU:
mov ecx, part 1 of A
xor ecx, part 1 of B
and ecx, part 1 of MASK
mov eax, part 2 of A
xor eax, part 2 of B
and eax, part 2 of MASK
or ecx, eax
mov eax, part 3 of A
xor eax, part 3 of B
and eax, part 3 of MASK
or ecx, eax
...
In the case of a 128-bit number, you need 4 "parts".
It does not matter if "part 1" is the upper or the lower bits of the numbers.
If (A AND MASK) = (B AND MASK) is true, ecx will have the value 0 (and the zero flag will be set because of the or instruction).