How to abstract simd code used in template function

Question

so imagine that I have a function add

template<class type>
type* add(type*,type* x, int len)
{
    type *result = new type[len];
    for(int i = 0; i < len; i += simd_reg_size)
    {
      // do addition here
    }
    return result;
}

I Thought of using template specializations such that I write a specialized add function for each datatype I'd like to support however it seems the syntax I have thought of is invalid.

template<>
__m256 SIMD_add<float>(__m256 x, __m256 y){
    return _mm256_add_ps(x, y);
}

Am not an expert in c++ but from what I have understood I must use the templete type in the arguments or parameters. I also thought of writing a macro with something similar to

#define simd_add(x, y, type)
 if type == int
     simd_add_integer(x, y)

However looking around, it seems that there is no way to actually perform the conditional statements in macros. What should I do ? its important for me that the syntax is flexible enough to allow me to abstract architectures such as arm in addition to x86.

Do you need specializations for the *add* routine? Can you not just have a list of overloads for each type? Your templated `add()` function code will then simple call the relevant overload. — Galik, May 25 '18 at 21:44
the problem with ur approach is that all simd_add regardless of type get their input as __mm register so no overloading can be done — user3553551, May 25 '18 at 22:01
I might be missing something here, but if you have two functions that do two different things with the same parameter types why not just give them different names? — Paul Sanders, May 25 '18 at 22:23
because I want the template add to simply use the SIMD_add by passing its own template type to it. So i don't have to write add the function multiple times — user3553551, May 26 '18 at 06:22

Adrien Leravat · Answer 1 · 2018-05-26T17:14:17.140

0

For template function, the specialization type is defined directly where you would use the template class type(s), without appending to the function name like you did.

So your specialized function for __m256 would be

template<>
__m256 SIMD_add(__m256 x, __m256 y){
    return _mm256_add_ps(x, y);
}

Like you mentioned, if you want to add your "add" method to a template class, no matter the specialization, just declare it as a standard method.

You can have a look Explicit specializations of function templates, and here is a similar question regarding __m256 registers addition.

edited May 26 '18 at 17:14

answered May 25 '18 at 21:39

Adrien Leravat

2,731
18
32

so keeping the type as of the input parameters as __m256 is impossible right ? – user3553551 May 25 '18 at 22:03
the problem is sometimes I use SIMD_add to accumulate values in a simd register. And the _mm256_add_ps returns a register containing 8 values so returning a float is not feasible. Thus I'd have to allocate a float array to store the computation's result and return that. However that would completely eliminate any performance improvements gained by simd operations. – user3553551 May 25 '18 at 22:06
Ok right I get it. I updated the answer to simply make use of "__m256" register type, sorry I missed that part. Thanks for the question, may help others! – Adrien Leravat May 26 '18 at 17:10

score 0 · Answer 2 · answered May 26 '18 at 06:23

0

It turns out that I can simply make a template class and with all methods static and the compiler will allow me to do template specialization without using the template type in the functions.

answered May 26 '18 at 06:23

user3553551

75
1
7

How to abstract simd code used in template function

2 Answers2