1

I'm doing a binary search with bsearch(array, arrays, NUM_ARRAYS, 16, compare_func) and

int compare(const void *p1, const void *p2) 
{
    return memcmp(p1, p2, 16);          // unsigned char arrays[][16]
}

Since it is 16 bytes, it would fit in a single 128-bit register.

How to modify this C function to force the comparison to be done with a 128-bit register CPU instruction? It should be much faster.

Linked questions: Comparison of 128 bit unsigned integers in x86-32 assembly, What are the 128-bit to 512-bit registers used for? but it doesn't answer directly.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Basj
  • 41,386
  • 99
  • 383
  • 673
  • 1
    Which architecture are you programming for? Note that no current architecture with 128 bit registers I know of actually supports treating these as 128 bit numbers. So there are no 128 bit comparison instructions and your use case is better suited by just comparing as 64 bit twice. – fuz May 19 '22 at 08:39
  • @fuz Normal Intel x86/64 bit CPU. Moreover, on Windows (for which int128 seems unavailable for Visual C++ compiler) :) – Basj May 19 '22 at 08:48
  • 1
    x86 compare instructions on XMM vector registers uses at the widest 64-bit elements, so you can do two *independent* 64-bit compares in a 128-bit register. https://www.felixcloutier.com/x86/pcmpgtq. Normally easier for a compiler to use two scalar instructions, instead of multiple SIMD instructions, to produce a single 128-bit compare result. Although if you actually want `memcmp` so low-address bytes are more significant than high-address bytes, that would cost extra instructions. You can see that and GCC's `unsigned __int128` on https://godbolt.org/z/c1fn7dGE7 – Peter Cordes May 19 '22 at 14:13

1 Answers1

3

If numbers are stored in big endian order and the pointers are aligned on 16 byte boundaries, the comparison as unsigned 128 bit values will produce the same result as memcmp. Whether it will be more efficient will depend on the compiler and optimisation settings.

Here is a modified function:

typedef unsigned __int128 u128_t;  // works for gcc, adjust for your compiler

int compare(const void *p1, const void *p2) {
    const u128_t *v1 = p1;
    const u128_t *v2 = p2;
    return (*v1 > *v2) - (*v1 < *v2);
}

The problem is your target system likely uses little endian order (eg: x86 CPUs). If your goal is to find the array in an array of arrays, you could still use this trick as long as the array is sorted using the same comparison.

Using bsearch requires a function pointer that returns a signed value equal to 0 for elements that compare equal, is negative if the element pointed to by p1 is less than the one pointed to by p2 and a positive value otherwise. Another problem with this approach is type punning and alignment issues which produce undefined behavior.

It would be safer and more efficient to write a binary search function that operates on an array of unions and uses a single comparison per iteration to locate the matching entry. This array must be sorted and sorting it can be performed using qsort() with the compare128() function.

Here is an example:

#include <stddef.h>

typedef unsigned __int128 u128_t;  // works for gcc, adjust for your compiler

typedef union {
    char c[16];
    u128_t u128;
} mytype;

/* comparison function for qsort and bsearch */
int compare128(const void *p1, const void *p2) {
    const mytype *v1 = p1;
    const mytype *v2 = p2;
    return (v1->u128 > v2->u128) - (v1->u128 < v2->u128);
}

int binarySearch128(const mytype array[], size_t n,
                    const unsigned char key[16])
{
    u128_t keyval;
    memcpy(&keyval, key, sizeof keyval);
    size_t lo = 0, hi = n;
    while (lo < hi) {
        size_t mid = lo + (hi - lo) / 2;
        if (array[mid].u128 < keyval) {
            lo = mid + 1;
        } else {
            hi = mid;
        }
    }
    if (lo < n && array[lo].u128 == keyval) {
        return (int)lo;
    } else {
        return -1;
    }
}

On platforms without 128-bit integer support, you can use this:

#include <stdint.h>

typedef union {
    char c[16];
    uint64_t u64[2];
} mytype;

// comparison function for qsort
int compare128(const void *p1, const void *p2) {
    const mytype *v1 = p1;
    const mytype *v2 = p2;
    int cmp = (v1->u64[0] > v2->u64[0]) - (v1->u64[0] < v2->u64[0]);
    return cmp ? cmp : (v1->u64[1] > v2->u64[1]) - (v1->u64[1] < v2->u64[1]);
}

int binarySearch128(const mytype array[], size_t n,
                    const unsigned char key[16])
{
    mytype keyval;
    memcpy(&keyval, key, sizeof keyval);
    size_t lo = 0, hi = n;
    while (lo < hi) {
        size_t mid = lo + (hi - lo) / 2;
        if (array[mid].u64[0] < keyval.u64[0]
        ||  (array[mid].u64[0] == keyval.u64[0] && array[mid].u64[1] < keyval.u64[1]) {
            lo = mid + 1;
        } else {
            hi = mid;
        }
    }
    if (lo < n && array[lo].u64[0] == keyval.u64[0] && array[lo].u64[1] == keyval.u64[1]) {
        return (int)lo;
    } else {
        return -1;  // or 0
    }
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Thanks @chqrlie! Will this automatically use a 128-bit register (is there a hint for the compiler to do this?) or should I use some options at compile time? Do you think this will work for `cl.exe` (Microsoft Visual C++ compiler)? Last thing: I only need to know true/false if the 128-bit sequence is present or not with the binary search, not to locate it precisely, I don't know if it changes anything. – Basj May 19 '22 at 08:12
  • @Basj: I rephrased the answer: `memcmp` and direct u128 comparison give the same result for equality, but not for ordering on little endian systems, such as Windows, but if you sort the array with the same comparison function, you will get the expected results. – chqrlie May 19 '22 at 08:15
  • Minor typo: is it `array[mid]` and `array[lo]` instead of `p[...]`? – Basj May 19 '22 at 08:20
  • Oh sad, int128 does not seem to be available on Windows: https://stackoverflow.com/questions/6759592/how-to-enable-int128-on-visual-studio – Basj May 19 '22 at 08:21
  • @Basj: indeed no support yet. You could use gcc for windows. – chqrlie May 19 '22 at 08:29
  • Note that as x86 does not have any 128 bit comparison instructions, this code would perform just as well using just two 64 bit comparisons. – fuz May 19 '22 at 08:41
  • @fuz: the code generated does not use 128 bit registers, but pairs of 64 bit registers. It is probably still faster than `memcmp()` but a proper benchmark is needed. – chqrlie May 19 '22 at 08:47
  • 1
    @chqrlie Sure. But as OP's compiler does not support `__int128`, using 64 bit integers will be just as fast. Note that your code has a lot of violations of the strict aliasing rule (unless the underlying type of the array is `uint64_t`). Consider using a union to do this safely. – fuz May 19 '22 at 08:51
  • @fuz: since you'd need to load multiple `char` elements into a union with an array, probably best to just `memcpy` in the first place as an aliasing-safe unaligned load (pointing a `union*` at random memory isn't safe), to get you portability to MSVC and GCC/clang, and as a side benefit also be valid ISO C++ in case you care about using this code in a different language at any point. Union type-punning is safe in ISO C99, but I find it often makes the code kind of ugly and take more work to follow, since you have to look for the union definition and see what random names were chosen, etc. – Peter Cordes May 19 '22 at 14:16
  • @PeterCordes: to avoid type punning and alignment issues, I amended the code to use arrays of unions. – chqrlie May 19 '22 at 15:12
  • @fuz: to avoid type punning and alignment issues, I amended the code to use arrays of unions. – chqrlie May 19 '22 at 15:12
  • Taking an array of unions means it's only safe to call if that's what the caller actually has; I'm pretty sure casting from `char chunks[][16]` would be strict-aliasing UB. (Which could become a problem after inlining.) – Peter Cordes May 19 '22 at 15:17
  • @PeterCordes: of course. answer updated with more explicit language. – chqrlie May 19 '22 at 15:31
  • @Basj just create a custom 128-bit struct on MSVC. There's nothing difficult to compare them: https://godbolt.org/z/brqWs3nTr – phuclv May 19 '22 at 15:33
  • `const mytype *v1 = p1;` is not different from taking a `const mytype *function_arg`. It's just as safe if the compiler can't inline, and just as dangerous if it can. And is still UB in the abstract machine. Casting to `void*` and back doesn't "launder" type information and allow aliasing; the underlying type of the pointed-to object must match the pointer being dereferenced, or the pointer must be `char*` or `unsigned char*`. (Or in GNU C, a `__attribute__((may_alias,aligned(1)))` typedef.) – Peter Cordes May 19 '22 at 15:36
  • @PeterCordes: the `compare128()` function is intended for use by `qsort() ` and `bsearch()`. The API is fixed. The cast to `const void *` and back to `const mytype *` should not be a problem as long as the array passed to `qsort()` and `bsearch()` has the proper type.,`mytype[]`. – chqrlie May 19 '22 at 16:10