2

So I've been tryna learn about SEE optimization on my own and I'm not quite getting it, I thought a simple function that just zeroes the memory would be easy to implement, so I went on and tried to implement it myself.

Here is the zero memory function that loops from the buffer start to buffer end and uses _mm_store_si128 to zero it out.

bool zeromem( byte * _dest, uint _sz )
{
    if ( _dest == nullptr )
        return false;
    __m128i zero = _mm_setzero_si128( );

    for ( auto i = rcast<__m128i*>( _dest ),
          end = rcast<__m128i*>( _dest + _sz );
          i < end; ++i )
    {
        _mm_store_si128( i, zero );
    }
    return true;
}

Exception thrown: Access Violation (0x00000) even though the pointer is not 0x00000.

The test I did was just allocating 1024 bytes of memory and then calling zeromem.

The exception is thrown on the first iteration.

1 Answers1

5

_mm_store_si128 translates to MOVDQA and requires operands to be aligned on a 16-byte boundary which could cause the exception. IIRC, For example Windows doesn't implement an explicit alignment exception, so it causes an access violation. Concerning the memset implementation you might be interested in this post comparing different approaches to filling a memory block with bytes.

Community
  • 1
  • 1
zx485
  • 28,498
  • 28
  • 50
  • 59
  • 1
    Note that using `_mm_storeu_si128` instead of `_mm_store_si128` would solve the immediate problem. – Paul R Feb 05 '16 at 17:06
  • Also you can keep aligned version on memory allocated using [_aligned_malloc](https://msdn.microsoft.com/en-us/library/8z34s9c6.aspx) – UmNyobe Feb 05 '16 at 17:08
  • With 32-bit architectures (x86 and ARM) heap allocations are only 8-byte aligned, not 16-byte. The DirectXMath library can suffer from the same issues for the same reasons. See [MSDN](https://msdn.microsoft.com/en-us/library/windows/desktop/ee418725(v=vs.85).aspx#type_usage_guidelines_). – Chuck Walbourn Feb 06 '16 at 06:45