A Guide to Python asyncio
This blog post was originally published at https://www.polidea.com/blog.
It is hard to imagine modern programming in Python without the asyncio library. The package that lets Python programmers write concurrent code is one of the largest and most ambitious libraries ever added to Python. In this asyncio tutorial, we will examine what are the biggest advantages of using it. I will begin with a few basic examples that describe asyncio’s fundamentals. After that, I will present a bigger example — a script for downloading files asynchronously that is written with asyncio. Before we dive deeper into the topic, let us first establish a good understanding of concurrency and parallelism.
Concurrency is like having multiple threads running on a single CPU core. There is nothing extraordinary about that. A modern workstation has four or eight CPU cores, but, at the same time, is running more than 100 processes. Even though the CPU itself can’t handle more than four or eight jobs at once, the computer is seamlessly dealing with +100 processes.
On the other hand, parallelism is like running two threads simultaneously on different cores of a CPU. Do note that the parallelism implies concurrency, but not the other way around.
So, what’s the role of asyncio in the world of concurrent programming?
We all know that one of the main causes of slowdowns are I/O operations — for instance, accessing files from a hard drive, executing queries in databases, or waiting for data that is about to arrive over the network. It is crucial to handle those operations in an efficient way. One solution is to create a thread and wait for I/O in that thread. However, threads don’t come cheap in Python. The threading library is based on OS-threads, and it is an operating system that manages the threads and their call stacks. Each thread consumes some amount of memory. There are also context switching costs involved. In the case of a server application, it is probably not a good idea to create a separate thread for each opened connection, because the resources will run out very shortly.
That’s where concurrency might help. Because all concurrent operations are running on a single thread, there are no context switching costs and memory overhead.
asyncio has been available since Python 3.4 and is constantly gaining new features in each minor release of Python. In the following asyncio examples, some of the newest asyncio’s features are used. Therefore, a minimum version of Python needed to run all of them is 3.7.
Coroutines and Tasks
Coroutines are a key element of the library. Just like generators, coroutines produce data, but can also consume data. A coroutine can suspend its execution if no further progress can be made (because, for instance, it is waiting for a network request to be completed) and transfer the control to another coroutine, which can utilize CPU time better. The point where the coroutine suspends its execution is saved. Once the network response comes, the execution could be resumed from that point. Historically, before Python 3.5, coroutines shared syntax with generators. This has changed with PEP 492, which introduced new async/await syntax to Python. Now coroutines are declared with the
The execution of coroutines is similar to generators. Calling a coroutine will not schedule it to be executed. It will just return a coroutine object. So, how should we execute a coroutine? One solution is to use
await. Let’s start with a simple example that prints a string after waiting for 1 second, and then prints another string after waiting for another 2 seconds.
The output is:
10:53:24.500115 "Hello!" scheduled for execution
10:53:25.504785 "Hi!" scheduled for execution
The most significant parts of the code are:
import asyncio— it imports the asyncio library.
asyncio.run(main())— this function executes the coroutine
main. It is used as the main entry point for asyncio programs and should be called only once.
await say_after(‘Hello!’, 1) — this statement pauses the coroutine and schedules
say_after to run immediately. The control is given back to the caller only when the coroutine finishes.
await asyncio.sleep(delay) — this is an equivalent of a blocking operation. This expression handles the control flow to the event loop, which will resume the coroutine after the sleep delay. Meanwhile, the event loop will continue running and may do something else. Do not use
time.sleep(…) in asyncio programs unless you want to freeze the event loop and the whole application as a result!
So far so good, but what if we want to run those two
say_after coroutines concurrently? Let’s modify the example and create asyncio
The output is:
13:19:52.941399 "Hello!" scheduled for execution
13:19:52.941477 "Hi!" scheduled for execution
The snippet ran 1 second faster than before! It can be observed that both coroutines were scheduled at the same time.
asyncio.create_task() creates a task from the coroutine object and schedules it on the event loop, but does not pause the caller. This is an important difference between creating a Task via
asyncio.create_task() and awaiting via
await on a coroutine.
Having covered these fundamentals, let’s move forward to a more real-world example. If you’ve done any programming with threads, you know that there is no API to terminate a thread from the outside. A special object must be instantiated, passed as an argument to a thread, and then checked continuously. For asyncio tasks, there is the
Task.cancel() instance method that can be used. The following asyncio example shows how this works.
The output is:
11:36:30.754006 Starting worker...
11:36:31.755152 Message from worker
11:36:32.756569 Message from worker
11:36:33.757679 Message from worker
11:36:34.255729 The task has been canceled
Task object is canceled after approx. 3.5 seconds.
asyncio.CancelledError at the await line. The coroutine may catch the exception to execute some teardown code:
The coroutine may even suppress cancelation if the exception is not re-raised after being catched.
Network I/O is a good example of how an asynchronous operation can handle things more efficiently. Instead of wasting CPU cycles waiting, it is better to do something else until a response comes back from the network. The following example downloads zip archives containing documentation for three versions of Python 3.8: 3.8.4, 3.8.5 and 3.8.6.
Python’s asyncio does not support HTTP directly. Popular HTTP clients, like urllib.request and requests, cannot be used either, because they are not asynchronous. Fortunately, there is aiohttp, an asynchronous HTTP client for asyncio. aiohttp is not in the standard library, so it must be installed:
pip install aiohttp
Now let’s review the script.
• The process is started in
async with is a new syntax for an asynchronous context manager. The asynchronous context manager is a context manager able to suspend execution in its enter and exit methods. aiohttp uses them to manage the lifecycle of its sessions and connections.
asyncio.gather is a handy function that runs many awaitable objects (coroutines and tasks) concurrently and returns an aggregated list of results once all awaitables are completed.
get_file is an actual coroutine responsible for querying an URL.
• If the response status code is not 404, a generator object
writer is created. Do note that the generator is not starting immediately! The generator must advance to the first yield before it can start receiving values. This can be done with the
• By using the content attribute, we avoid loading the whole response in memory.
StopIteration signals that the generator has exited and the file has been closed. We can break the loop.
write generator function is responsible for opening the file, writing chunks of data, and closing the file. This generator is not asynchronous, because Python does not provide an asynchronous filesystem API. Once the generator returns,
StopIteration is raised in
asyncio introduces a whole new way of writing concurrent code in Python. Many third-party libraries are introducing support for asyncio, and its popularity is growing fast. Meanwhile, because it is relatively new, asyncio still lacks coverage in books and online tutorials. This is especially true for the new async/await syntax. Let’s hope that despite all the obstacles, the spread of asyncio will continue and result in better and more efficient Python code.