Accelerating Python code with Numba

1 minute read

Published:

Numba is an open-source numerical Python compiler that translates a subset of Python and NumPy code into machine code, executed on the CPU. It can be used to significantly speed up the execution of numerical computations, especially those that are performed in a loop, by compiling the code into machine code rather than interpreting it dynamically.

Numba provides a just-in-time (JIT) compiler, which means that it compiles the code on-the-fly, just before it is executed. This allows for the optimizations to be applied exactly where they are needed, rather than in advance. Numba supports a wide range of Python data types and functions, and it provides a simple, easy-to-use API for adding JIT compilation to existing code.

Numba allows you to write high-performance Python code by leveraging the power of JIT compilation. It can be used for a wide range of applications, from scientific computing and data analysis to machine learning.

Example

Here is a simple example of how Numba can be used to speed up a calculation:

Consider the following function:

def go_fast_nonumba(a):
trace = 0.0
for i in range(a.shape[0]):
trace += np.tanh(a[i, i])
return a + trace

This function can be slow when working with large inputs. To speed up the calculation, we can use Numba to compile the function:

@jit(nopython=True)
def go_fast(a): # Function is compiled and runs in machine code
trace = 0.0
for i in range(a.shape[0]):
trace += np.tanh(a[i, i])
return a + trace
view raw numba_def.py hosted with ❤ by GitHub

By adding the @jit decorator, Numba compiles the function into machine code, which can be executed much faster than the interpreted code. The compiled function can then be used just like any other Python function:

from numba import jit
import numpy as np
import time
x = np.arange(10000).reshape(100, 100)
@jit(nopython=True)
def go_fast(a): # Function is compiled and runs in machine code
trace = 0.0
for i in range(a.shape[0]):
trace += np.tanh(a[i, i])
return a + trace
def go_fast_nonumba(a):
trace = 0.0
for i in range(a.shape[0]):
trace += np.tanh(a[i, i])
return a + trace
# DO NOT REPORT THIS... COMPILATION TIME IS INCLUDED IN THE EXECUTION TIME!
start = time.time()
go_fast(x)
end = time.time()
print("Elapsed (with compilation) = %s" % (end - start))
n_rep = 100;
# NOW THE FUNCTION IS COMPILED, RE-TIME IT EXECUTING FROM CACHE
start = time.time()
[go_fast(x) for _ in range(n_rep)]
elp1 = time.time() - start
print("Elapsed (after compilation) = %s" % (elp1 / n_rep))
start = time.time()
[go_fast_nonumba(x) for _ in range(n_rep)]
elp2 = time.time() - start
print("Elapsed (No NUMBA) = %s" % (elp2 / n_rep))
print('Ratio: %s' % (elp1/elp2))
view raw numba_total.py hosted with ❤ by GitHub

The output of the above script is:

Elapsed (with compilation) = 0.1845691204071045
Elapsed (after compilation) = 1.4524459838867188e-05
Elapsed (No NUMBA) = 0.0002772068977355957
Ratio: 0.05239573747086498
view raw numba_out hosted with ❤ by GitHub

In this example, we see a significant speedup in the execution time of the function. The same technique can be used to speed up other types of calculations, such as matrix operations, Monte Carlo simulations, and more.

Reference

[1] A ~5 minute guide to Numba