1

I am trying to load a vector into the SSE register, my code compiles without error, but when I am trying to run it, I've got segmentation fault. Here it is my code:

inline int SSEJaccard::calcSSEJaccardDist(unsigned int id1, unsigned int id2) {
  int result;
  __m128i v, v1;
  std::vector<uint32_t> &fv1 = fvs[id1];
  std::vector<uint32_t> &fv2 = fvs[id2];
  v = _mm_load_si128((__m128i const*) (&fv1));
  v1 = _mm_load_si128((__m128i const*) (&fv2));
  v = _mm_and_si128(v,v1);
  result =_mm_extract_epi16(v, 0) + _mm_extract_epi16(v, 4);
 return result;
}

And fsv is a global variable which is defined like this:

std::vector<std::vector<uint32_t> > fvs;

I am using Intel Compiler (ICC). Thank you

plasmacel
  • 8,183
  • 7
  • 53
  • 101
  • 1
    A vector of vectors is bad for performance. Make `fvs` a single vector and simulate a 2D array by doing your own indexing. See [the answers on this question](http://stackoverflow.com/questions/33093860/using-nested-vectors-vs-a-flatten-vector-wrapper-strange-behaviour). Only use `vector>` if you need a "ragged" array where different rows can have different lengths, and grow/shrink separately. (Or maybe if you need to change the number of columns on the fly.) – Peter Cordes Oct 01 '16 at 06:49
  • I know I wrote an answer to another question where the OP was using arrays of pointers instead of proper multidimensional arrays, but I can't find it. :/ That question was using pointers to arrays in nearly worst-case conditions (scattered instead of contiguous memory access with many very tiny allocations), and got more than a 10x speedup from fixing it, IIRC. – Peter Cordes Oct 01 '16 at 06:52

2 Answers2

8

Notice that you're passing a pointer to a std::vector into the intrinsic.

Instead you should be passing a pointer to the data that said vector contains, e.g.

v = _mm_load_si128((__m128i const*) (&(fv1[0])));

or

v1 = _mm_load_si128((__m128i const*) (fv2.data());

std::vector object itself just holds a pointer and allocated / current size info, and that's not what SSE intrinsics expect at all. This also explains segfault as sizeof(std::vector) may very well be less than 16 bytes (in my case it returns 12).

Alignment should always be a matter of consideration with SSE of course, though it can be forced on std::vector with some clever allocator trickery. Here is SO question on that topic.

Also make sure that your std::vectors have enough data, namely 4 elements (could be more, will be discarded given proper alignment)

Community
  • 1
  • 1
Ap31
  • 3,244
  • 1
  • 18
  • 25
  • 1
    The `std::vector` object itself just holds a pointer and allocated / current size info. Getting at the actual data is an extra level of indirection, not just an offset from the start of the object. As far as alignment, it can be hard to get `std::vector` to use aligned allocations. I seem to recall reading that MSVC has a bug that makes it impossible even with a custom allocator. So you may need to just use `_mm_loadu_si128`, which is not ideal but not slow on modern CPUs. – Peter Cordes Oct 01 '16 at 07:00
  • +1 good point on the indirection, I'll fix the misleading sentence. I tried MSVC140 just now and it seems to be working, no guarantee on consistency of course – Ap31 Oct 01 '16 at 08:14
2

You need to make sure you have aligned data structures before using aligned loads and stores. I dont think the default vector alocator does alignment at a 16-byte boundary needed by SSE2 instructions.