-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
tutorials simplecdef
A cdef function is a function in Cython that is only callable from cython (not python). They can be used as a way to write small functions that will run at C speed (because they are C functions).
After looking at the profiling example in the docs:
http://docs.cython.org/src/tutorial/profiling_tutorial.html
I tried to see what could be done to get as "pure" a C function as I could. i.e. I looked at the generated code, ant tried to get as little python/cython code in there as possible. First the punchline:
You probably don't need to bother; Cython, and the C compiler, do a pretty good job off the bat. Some lessons:
*If you're looking at the generated code (which can be useful for experts), textual length is a poor indicator of actual runtime overhead for much of the auto-generated boilerplate.
- Don't worry about getting rid of every little bit of extra code in cdef functions.
- It's actually easier to get top performance (i.e. compiler inlining, etc) if you write small cdef functions rather than use external C ones.
This is the example from the profiling tutorial:
cdef double recip_square1(double i): return 1./(i*i)
simple and straightforward -- but there is more that you'd expect (or I expected) of cython boilerplate generated, so I though I'd try to clean that out.
Python and C have different rules for division in some cases: with negative numbers, raising an exception with divide by zero, etc. Cython injects some code into the C so that you'll get the same results from Cython as you do from Python. This adds some boilerplate to the generated code, and I thought maybe a performance hit, so I tried turning that off:
## second version: turn on cdivision @cython.cdivision(True) cdef inline double recip_square2(double i): return 1./(i*i)
indeed, this results in very clean generated C code:
static CYTHON_INLINE double __pyx_f_10calc_pi_cy_recip_square2(double __pyx_v_i) { double __pyx_r; __Pyx_RefNannyDeclarations __Pyx_RefNannySetupContext("recip_square2", 0); __pyx_r = (1. / (__pyx_v_i * __pyx_v_i)); goto __pyx_L0; __pyx_r = 0; __pyx_L0:; __Pyx_RefNannyFinishContext(); return __pyx_r; }
note that the __Pyx_RefNanny
stuff are macros that essentially go away when compiled.
For comparison, I also wrote a external function in C, and called that from Cython (recip_square3.c):
// pure C function for recip_square3 double recip_square3(double i) { return 1. / (i*i); }
(recip_square3.h):
// header for pure C function for recip_square3 double recip_square3(double i);
and the declaration in Cython:
## third version: call an external C function cdef extern from "recip_square3.h": double recip_square3 (double i)
I also tried turning on and off inlining, and hand inlining the code directly in the cython. Here are all the versions I tried: (calc_pi_cy.pyx):
# File: calc_pi_cy.pyx # # Test of making a "pure C" cdef function # # Borrowed from examples given in: # # http://docs.cython.org/src/tutorial/profiling_tutorial.html # # Chris Barker: [email protected] # July 1, 2013 cimport cython ## first version: simple inlined cdef cdef inline double recip_square1(double i): return 1./(i*i) def approx_pi1(int n): cdef int k cdef double val = 0. for k in range(1, n+1): val += recip_square1( k ) return (6 * val)**0.5 ## second version: turn on cdivision @cython.cdivision(True) cdef inline double recip_square2(double i): return 1./(i*i) def approx_pi2(int n): cdef int k cdef double val = 0. for k in range(1, n+1): val += recip_square2( k ) return (6 * val)**0.5 ## third version: call an external C function cdef extern from "recip_square3.h": double recip_square3 (double i) def approx_pi3(int n): cdef int k cdef double val = 0. for k in range(1, n+1): val += recip_square3( k ) return (6 * val)**0.5 ## fourth version: completely inline the function in cython cimport cython def approx_pi4(int n): cdef int k cdef double val = 0. for k in range(1, n+1): val += 1./(<double>k * <double>k) return (6 * val)**0.5 ## fifth version: regular cdef, no inline, cdivision @cython.cdivision(True) cdef double recip_square5(double i): return 1./(i*i) def approx_pi5(int n): cdef int k cdef double val = 0. for k in range(1, n+1): val += recip_square5( k ) return (6 * val)**0.5
and some timing code:
#!/usr/bin/env python """ timing script for calc_pi examples """ import timeit N = 100000 def timer(version): time_number = 1000 print timeit.timeit("approx_pi%i(N)"%version, number=time_number, setup="from __main__ import approx_pi%i, N"%version), print "seconds" from calc_pi_cy import * for i in range(1, 6): print "cython version %i:"%i timer(i) # and test result: print eval("approx_pi%i(%i)"%(i,N)) print
Here is a run of the timing code:
$ ./time_calc_pi.py cython version 1: 0.410186052322 seconds 3.14158310433 cython version 2: 0.404766082764 seconds 3.14158310433 cython version 3: 1.44341897964 seconds 3.14158310433 cython version 4: 0.403806209564 seconds 3.14158310433 cython version 5: 0.403886079788 seconds 3.14158310433
so, they all take essentially the same amount of time to run, except version 3 -- which takes a LOT longer. Version 3 is the one that calls an external C function. I haven't looked at the compiler results to see for sure, but I'm pretty sure what's happening is that the compiler can auto-inline this simple function when it's all in the same C module -- when calling an external C function, it can't be inlined, and you have the C function call overhead -- small, but a lot when you are dealing with simple functions.
So: the moral of the story (see above) -- don't bother! Cython and the C compiler do a fine job as it is.
[Note: tested with Cython 0.19.1, Python 2.7 32 bit on OS-X 10.7 (gcc 4.2)
I've attached the cython code, timing code, and a setup.py to build it all. I'd be interested to know if there is diference with other platforms/compiliers.
- Chris Barker ([email protected])
A setup.py
to built it:
#!/usr/bin/env python from distutils.core import setup from distutils.extension import Extension from Cython.Build import cythonize setup( ext_modules = cythonize( [Extension('calc_pi_cy', ['calc_pi_cy.pyx', 'recip_square3.c']), ] ) )
(All code attached to this page in a zip file)