Assigning values to array slices is slow

Question

I'm trying to optimize a Python algorithm by implementing it in Cython. My question is regarding a certain performance bottleneck that exists in the following code:

@cython.boundscheck(False) # turn off bounds-checking for entire function
def anglesToRGB( np.ndarray[double, ndim=2] y, np.ndarray[double, ndim=2] x ):

cdef double angle
cdef double Hp
cdef double C
cdef double X
cdef np.ndarray[double, ndim=3] res = np.zeros([y.shape[0], y.shape[1], 3], dtype=np.float64)

for i in xrange(y.shape[0]):
    for j in xrange(y.shape[1]):
        angle = atan2( y[i,j], x[i,j] )*180.0/PI+180

        C = sqrt(pow(y[i,j],2)+pow(x[i,j],2))/360.0 #Chroma
        Hp = angle/60.0
        X = C*(1-fabs( Hp%2-1))

        C *= 255
        X *= 255

        if (0. <= Hp < 1.):
            res[i,j,:] = [C,X,0]
        elif (1. <= Hp < 2.):
            res[i,j,:] = [X,C,0]
        elif (2. <= Hp < 3.):
            res[i,j,:] = [0,C,X]
        elif (3. <= Hp < 4.):
            res[i,j,:] = [0,X,C]
        elif (4. <= Hp < 5.):
            res[i,j,:] = [X,C,C]
        else:
            res[i,j,:] = [C,0,X]

return res

I've identified the major bottleneck to be when i assign a list of values to a slice of the res array, like with

res[i,j,:] = [C,X,0]

However, if i change the assignment to

res[i,j,0] = C
res[i,j,1] = X
res[i,j,2] = 0

Then the code runs orders of magnitude faster. To me this is strange because surely the Cython compiler should be smart enough to do this for me? Or do i need to provide it with some hints first? I should note that changing the slicing to 0:3 instead of : and making the list of values a numpy array doesn't improve the performance.

What i'd like to know is why this operation is killing performance so badly and if there's any way to solve it without having to sacrifice the convenient list and slice notation.

Best regards

Could part of your issue be that `[C, X, 0]` causes python to allocate a new list? Perhaps using a tuple would be faster: `(C, X, 0)` — Nathan Villaescusa, Nov 05 '12 at 23:24
@Dietrich they are, but they are slightly more memory efficient and faster to assign. http://stackoverflow.com/questions/68630/are-tuples-more-efficient-than-lists-in-python — Nathan Villaescusa, Nov 05 '12 at 23:26

Fred Foo · Accepted Answer · 2012-11-06T13:13:23.453

Nope, Cython (tested with 0.17) isn't smart enough to optimize this slice assignment away. If you look at the generated C code (use cython -a and click any line in the HTML report to see the generated code), then you can see that

res[i,j,:] = [C,X,0]

is compiled to

conversion between C and Python floating point types
allocation of a list [C,X,0]
allocation of a tuple (i, j, slice(None))
a call to res.__setitem__
error checks on all of these
deallocation of the allocated structures

I.e., almost all the same things CPython would do to execute this code.

What you can do to get around this is:

Declare three variables, say cdef double v1, v2, v3;
assign to these in the conditionals, like v1, v2, v3 = C, X, 0 etc., which is optimized to three C assignments;
after the conditional block, assign v1, v2, v3 to res[i,j,0] etc. in three separate assignments.

Assigning values to array slices is slow

1 Answers1