Some experiments with ctypes By leonardo maffi Version 1.0, 2007 06 02 ctypes (built into Python V.2.5+) is useful as a bridge to external compiled code, this is a very simple example using C code that I have tried on Windows with MinGW (it's a recursive function coming from the Computer Shootout, the CPython version is much much slower): C:\Temp>type alg.c int tak(int x, int y, int z) { if (y < x) return tak( tak(x-1, y, z), tak(y-1, z, x), tak(z-1, x, y) ); return z; } I compile it (I use MinGW, you can use what you have): C:\Temp>gcc -O3 -shared -o alg.so alg.c I call it: C:\Temp>python ActivePython 2.5.1.1 ... >>> from ctypes import * >>> alg = cdll.LoadLibrary('alg.so') >>> alg.tak(3*10, 2*10, 10) 11 This is is so simple because here I have used numbers only, with more complex data types things become a little more complex, but ctypes seem able to manage anything. To increase security you can also specify the I/O types of the functions: >>> alg.tak(10) # this is wrong 2227148 >>> alg.tak.argtypes = [c_int] * 3 >>> alg.tak.restype = c_int >>> alg.tak(10) Traceback (most recent call last): File "", line 1, in TypeError: this function takes at least 3 arguments (1 given) >>> alg.tak("a", "b", "c") Traceback (most recent call last): File "", line 1, in ctypes.ArgumentError: argument 1: : wrong type Some Python (and Psyco) code for a speed test of the tak() function, alg_call.py: from time import clock def tak(x, y, z): if y < x: return tak( tak(x-1,y,z), tak(y-1,z,x), tak(z-1,x,y) ) return z n = 8 print "Timings with n = %d:" % n t0 = clock() from ctypes import cdll, c_int alg = cdll.LoadLibrary('alg.so') t1 = clock() r1 = alg.tak(3*n, 2*n, n) t2 = clock() print "ctypes + C:", round(t1-t0,2), "+", round(t2-t1,2), "s" t = clock() r2 = tak(3*n, 2*n, n) print "Python 2.5:", round(clock()-t, 2), "s" t = clock() import psyco psyco.bind(tak) r3 = tak(3*n, 2*n, n) print "Psyco:", round(clock()-t, 2), "s" assert r2 == r2 == r3 The output: Timings with n = 8: ctypes + C: 0.04 + 0.09 s Python 2.5: 4.54 s Psyco: 0.68 s Timings with n = 9: ctypes + C: 0.03 + 0.62 s Python 2.5: 28.8 s (44.3 X) Psyco: 1.44 s ( 2.2 X) Timings with n = 10: ctypes + C: 0.03 + 3.88 s Python 2.5: Psyco: 8.99 s (2.3 X) ------------------------------------------- Another test a bit less naif: C:\Temp>type freqs.c: void charfreq(char *text, long len, long *freqs) { int i; for(i = 0; i < 256; i++) freqs[i] = 0; for(i = 0; i < len; i++) freqs[text[i]]++; } C:\Temp>gcc -O3 -shared -o freqs.so freqs.c (My experience with ctypes is very little because I've just started using it, so I've not succed using the return, so I've used a void function and modified freqs[] in place). The Python code freqs_call.py (variable names aren't good, this is just a quick test): from ctypes import * import psyco # not necessary freq = cdll.LoadLibrary('freqs.so') tyfreqs = c_long * 256 freq.charfreq.argtypes = [c_char_p, c_int, tyfreqs] freq.restype = None def freqs1(txt, freqs=tyfreqs()): assert isinstance(txt, str) # necessary to be safe len_txt = len(txt) # My tests have shown: # c_char_p() just returns a pointer to the python string # create_string_buffer() # actually copies the string freq.charfreq(c_char_p(txt), len_txt, freqs) return dict((chr(i), fr) for i, fr in enumerate(freqs) if fr) from collections import defaultdict def freqs2(txt): freqs = defaultdict(int) for c in txt: freqs[c] += 1 return freqs s = "This is a string" print freqs1(s) assert freqs1(s) == freqs2(s) s = "This is a \0string" print "c_char_p(s):", c_char_p(s) # it doesn't show, but this contains all s! assert freqs1(s) == freqs2(s) n = 10 ** 5 print "n =", n s = "This is a string" * n from time import clock t = clock() freqs1(s) print "ctypes:", round(clock() - t, 2), "s" t = clock() freqs2(s) print "CPython 2.5:", round(clock() - t, 2), "s" psyco.bind(freqs2) t = clock() freqs2(s) print "Psyco:", round(clock() - t, 2), "s" Output, C compiled with -O3: {'a': 1, ' ': 3, 'g': 1, 'i': 3, 'h': 1, 'n': 1, 's': 3, 'r': 1, 'T': 1, 't': 1} c_char_p(s): c_char_p('This is a ') n = 100000 ctypes: 0.03 s CPython 2.5: 2.33 s Psyco: 0.74 s {'a': 1, ' ': 3, 'g': 1, 'i': 3, 'h': 1, 'n': 1, 's': 3, 'r': 1, 'T': 1, 't': 1} c_char_p(s): c_char_p('This is a ') n = 1000000 ctypes: 0.24 s CPython 2.5: 23.42 s Psyco: 7.33 s According to the algorithm type, the speedup can be very large, here it is about one hundred times. The problem here is that the C function works with strings only, while the Python version works with unicode strings too. With ctypes you can use wchar_t, but I haven't tried that yet. I have tested the memory behavour of c_char_p() and create_string_buffer() with this little program, memory_test1.py, it uses my memory.py module that tells how much memory is used by a python program: from ctypes import * from memory import memory print "A:", memory(), "kb" s = "This is a string" * 10**6 print "B:", memory(), "kb" # return a pointer to the Python string #cs = c_char_p(s) # copies the string cs = create_string_buffer(s, len(s)) print "C:", memory(), "kb" Its outout with c_char_p(): A: 2036 kb B: 17680 kb C: 17680 kb Its outout with create_string_buffer(): A: 2032 kb B: 17676 kb C: 33320 kb ------------------------------------------- This is a callback test that can probably work on Win and Linux too (modified from Python docs 14.14.1.17 Callback functions), callback_test1.py: import sys from ctypes import cdll, c_int, CFUNCTYPE, POINTER, sizeof def py_cmp_func(a, b): print "py_cmp_func", a[0], b[0] return a[0] - b[0] if sys.platform.startswith('win'): libc = cdll.msvcrt qsort = libc.qsort qsort.restype = None TyArray = c_int * 5 a = TyArray(5, 1, 7, 33, 99) TyFunc = CFUNCTYPE(c_int, POINTER(c_int), POINTER(c_int)) qsort(a, len(a), sizeof(c_int), TyFunc(py_cmp_func)) print "\n", list(a) The output: py_cmp_func 1 5 py_cmp_func 7 5 py_cmp_func 33 7 py_cmp_func 99 33 py_cmp_func 1 5 py_cmp_func 7 5 py_cmp_func 33 7 py_cmp_func 1 5 py_cmp_func 7 5 py_cmp_func 1 5 [1, 5, 7, 33, 99] With few tests I have seen that the efficiency of this sort on Windows is quite low, it calls the cmp much more times than the (really good) Python sort(), callback_test2.py: import sys from random import shuffle from ctypes import * ncalls1 = 0 def py_cmp(a, b): global ncalls1 ncalls1 += 1 return a - b ncalls2 = 0 def ctypes_cmp(a, b): global ncalls2 ncalls2 += 1 return a[0] - b[0] if sys.platform.startswith('win'): libc = cdll.msvcrt qsort = libc.qsort qsort.restype = None TyFunc = CFUNCTYPE(c_int, POINTER(c_int), POINTER(c_int)) print "test 1:" n = 3000 ncalls1 = ncalls2 = 0 data = range(n) shuffle(data) sorted(data, cmp=py_cmp) print "Calls to py_cmp by CPython sort():", ncalls1 TyArray = c_int * n a = TyArray(*data) qsort(a, len(a), sizeof(c_int), TyFunc(ctypes_cmp)) print "Calls to ctypes_cmp by qsort:", ncalls2 print print "test 2:" ncalls1 = ncalls2 = 0 data = range(n) sorted(data, cmp=py_cmp) print "Calls to py_cmp by CPython sort():", ncalls1 TyArray = c_int * n a = TyArray(*data) qsort(a, len(a), sizeof(c_int), TyFunc(ctypes_cmp)) print "Calls to ctypes_cmp by qsort:", ncalls2 print print "test 3:" ncalls1 = ncalls2 = 0 data = range(n)[::-1] sorted(data, cmp=py_cmp) print "Calls to py_cmp by CPython sort():", ncalls1 TyArray = c_int * n a = TyArray(*data) qsort(a, len(a), sizeof(c_int), TyFunc(ctypes_cmp)) print "Calls to ctypes_cmp by qsort:", ncalls2 print print "test 4:" ncalls1 = ncalls2 = 0 data = range(n//2)*2 sorted(data, cmp=py_cmp) print "Calls to py_cmp by CPython sort():", ncalls1 TyArray = c_int * n a = TyArray(*data) qsort(a, len(a), sizeof(c_int), TyFunc(ctypes_cmp)) print "Calls to ctypes_cmp by qsort:", ncalls2 print print "test 5:" ncalls1 = ncalls2 = 0 data = range(int(n * 0.2)) shuffle(data) data = range(int(n * 0.8)) + data sorted(data, cmp=py_cmp) print "Calls to py_cmp by CPython sort():", ncalls1 TyArray = c_int * n a = TyArray(*data) qsort(a, len(a), sizeof(c_int), TyFunc(ctypes_cmp)) print "Calls to ctypes_cmp by qsort:", ncalls2 print The output: test 1: Calls to py_cmp by CPython sort(): 30675 Calls to ctypes_cmp by qsort: 43210 test 2: Calls to py_cmp by CPython sort(): 2999 Calls to ctypes_cmp by qsort: 31845 test 3: Calls to py_cmp by CPython sort(): 2999 Calls to ctypes_cmp by qsort: 32016 test 4: Calls to py_cmp by CPython sort(): 5998 Calls to ctypes_cmp by qsort: 170232 test 5: Calls to py_cmp by CPython sort(): 8485 Calls to ctypes_cmp by qsort: 47165 ------------------------------------------- Then I have tried to use ctypes with a little but useful C function, a K-means clustering routine, it has required some work. In order to succed I have done another memory test, regarding memory allocation by the function. This is the C code, memory_test2.c: // gcc -s -shared -o memory_test2.so memory_test2.c int *test() { int *labels = calloc(250000, sizeof(int)); return labels; } This is the Python code, memory_test2.py (again it uses my memory module): import sys from memory import memory from ctypes import cdll, c_int, POINTER, addressof def main(): print "memory 1:", memory(), "kb" # 2036 mem_test = cdll.LoadLibrary('memory_test2.so') # How to unload a library? print "memory 2:", memory(), "kb" # 2104 mem_test.test.argtypes = [] TyArray = c_int * 250000 print "memory 3:", memory(), "kb" # 2104 mem_test.test.restype = POINTER(TyArray) print "memory 4:", memory(), "kb" # 2104 pr = mem_test.test() print "memory 5:", memory(), "kb" # 3088 r = list(pr.contents) print "memory 6:", memory(), "kb" # 4188 print len(r), r[:10] # Out: 250000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] print "memory 7:", memory(), "kb" # 4188 if sys.platform.startswith('win'): libc = cdll.msvcrt libc.free(addressof(pr.contents)) # try to comment this line print "memory 8:", memory(), "kb" # 3208 print "memory 0:", memory(), "kb" # 2036 main() print "memory 9:", memory(), "kb" # 2104 Its output: memory 0: 2036 kb memory 1: 2036 kb memory 2: 2104 kb memory 3: 2104 kb memory 4: 2104 kb memory 5: 3088 kb memory 6: 4188 kb 250000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] memory 7: 4188 kb memory 8: 3208 kb memory 9: 2104 kb As you can see the tiny C function allocates 1 MB of memory, then I have to use the libc.free() to free it from Python, otherwise memory 9 is about 3100 kb, even after the end of the main() function, as you can see if you comment the libc.free(...) line: ... memory 7: 4192 kb memory 8: 4192 kb memory 9: 3092 kb At the moment I don't know if it's possible to unload the library loaded by ctypes, to free those 60-70 kb when it's not needed anymore. You can find the ctypes code for the bridge to the C k-means inside the kmeans.zip into the software section of my site.