We know that on NEON, the SIMD registers q0~q7 are shared with float registers s0~s31. So the code below has a bug:
float_t fRatio = (float_t)srcWidth/dstWidth;
// NEON asm modified q0~q7
MyNeonFunctionPtr1(pData, Stride, (int32_t)(fHorRatio*m_iHorScale));
// following sentence use wrong "fHorRatio",
// which is modified by "MyNeonFunctionPtr1";
int32_t vertStepLuma = (int32_t)(fHorRatio*m_iVertScale);
In x86, emms can solve it. But how do I do it on NEON? My temporary solution is to use volatile on vertStepLuma. Is there a better way? Thanks!