I'm trying to optimize a Python algorithm by implementing it in Cython. My question is regarding a certain performance bottleneck that exists in the following code:
@cython.boundscheck(False) # turn off bounds-checking for entire function
def anglesToRGB( np.ndarray[double, ndim=2] y, np.ndarray[double, ndim=2] x ):
cdef double angle
cdef double Hp
cdef double C
cdef double X
cdef np.ndarray[double, ndim=3] res = np.zeros([y.shape[0], y.shape[1], 3], dtype=np.float64)
for i in xrange(y.shape[0]):
for j in xrange(y.shape[1]):
angle = atan2( y[i,j], x[i,j] )*180.0/PI+180
C = sqrt(pow(y[i,j],2)+pow(x[i,j],2))/360.0 #Chroma
Hp = angle/60.0
X = C*(1-fabs( Hp%2-1))
C *= 255
X *= 255
if (0. <= Hp < 1.):
res[i,j,:] = [C,X,0]
elif (1. <= Hp < 2.):
res[i,j,:] = [X,C,0]
elif (2. <= Hp < 3.):
res[i,j,:] = [0,C,X]
elif (3. <= Hp < 4.):
res[i,j,:] = [0,X,C]
elif (4. <= Hp < 5.):
res[i,j,:] = [X,C,C]
else:
res[i,j,:] = [C,0,X]
return res
I've identified the major bottleneck to be when i assign a list of values to a slice of the res array, like with
res[i,j,:] = [C,X,0]
However, if i change the assignment to
res[i,j,0] = C
res[i,j,1] = X
res[i,j,2] = 0
Then the code runs orders of magnitude faster. To me this is strange because surely the Cython compiler should be smart enough to do this for me? Or do i need to provide it with some hints first? I should note that changing the slicing to 0:3 instead of : and making the list of values a numpy array doesn't improve the performance.
What i'd like to know is why this operation is killing performance so badly and if there's any way to solve it without having to sacrifice the convenient list and slice notation.
Best regards