2 minutes
Parallelism in Python.
In Python, by default we have three options to add parallel processing to our applications:
- threading
- multiprocessing
- concurrent.futures
The first option, threading
is useful with I/O and networking tasks, as the parallelization remains in a single core.
The second option, multiprocessing
, is used with intensive CPU tasks, and the processing will be distributed in all the cores of our machine. The disadvantage of multiprocessing
is that all the information and objects should be serializable. To know is an object is serializable, you can try to pickle
and unpickle
. If the process goes without error, you can use multiprocessing
as a rule of the tumb.
The last option, concurrent.futures
offers an API to use threading
and multiprocessing
with the same interface. The interface, in my opinion, is cleaner, and you can start programming and later on decide if you want to make use of threading
or multiprocessing
as the backend.
In this example, I will use the backend of threading
Let’s start. First, we import the modules.
import concurrent.futures
import time
Now let’s write a function that waits 5 seconds and return the text passed as the argument
def echo(text):
time.sleep(5)
return text
Now, lets try:
%%time
print(echo('Hello') + ' ' + echo('world'))
Hello world
Wall time: 10 s
As expected, the command took 10s to complete
Now we are going to apply the concurrent.futures to our application. The differences are that in this case, we do not call our functions directly, but we pass the function name and the arguments as arguments of the submit
function of concurrent.futures.Executor
. Another critical difference is the immediate return of our call is a Future
object, and if we want the actual return, we must use the result()
method.
If you did not understand the last paragraph, do not despair, it is easier to see with an example:
# First we must initialize the executor
executor = concurrent.futures.ThreadPoolExecutor()
# Now we can submit the tasks
example_task = executor.submit(echo, 'Hello')
Now we can see that the tasks are not functions, but a Future
object.
print(type(example_task))
<class 'concurrent.futures._base.Future'>
And if we wanted the return value, we use the result()
method.
print(example_task.result())
Hello
Now to print Hello world
, we do:
%%time
task1 = executor.submit(echo, 'Hello')
task2 = executor.submit(echo, 'world')
print(task1.result() + ' ' + task2.result())
Hello world
Wall time: 5.01 s
We see that it took 5 seconds.