Prerequisites

  • Experience with the specific topic: Novice
  • Professional experience: No industry experience required

Knowledge of general concurrent programming is not required, but the reader should be familiar with basic Python programming, as well as the basic theories behind web scraping. If you intend to follow this tutorial on your own system, please use this requirements.txt file to install the necessary libraries.

Introduction to Asynchronous Programming

This tutorial will provide an overview of asynchronous programming including its conceptual elements, the basics of Python's async APIs, and an example implementation of an asynchronous web scraper.

Synchronous programs are straightforward: start a task, wait for it to finish, and repeat until all tasks have been executed. However, waiting wastes valuable CPU cycles. Asynchronous programs remedy this problem by using context switching to interleave tasks in order to improve program responsiveness, and minimize processor idle time and total program execution time.

Simultaneous Chess Analogy

simultaneous-chess

To understand the benefit of asynchronous programming, consider the following scenario:

Ten opponents have challenged you to a game of chess. Since you are very good at chess, you take three minutes per move, whereas each of your opponents take five minutes. Assume that each full game takes twenty moves per player, or forty moves in total.

The time it takes to play one full game is:

num_moves * my_move_time + num_moves * opponent_move_time
= 20 * 3 + 20 * 5
= 160 minutes

Playing through each of the 10 games, one after the other, would take about 27 hours.

Notice that when you play the games sequentially, you spend a lot of time waiting. Each time you make a move, you have to spend 5 minutes waiting while your opponent responds. In fact, you wait for 100 minutes, or 62.5%, of each 160 minute game. That's 16.66 hours across all 10 games.

Instead of waiting, suppose that as soon as you finish making your move in a given game, you immediately progress to the next opponent. Starting with opponent 1, you take 3 minutes to make a move. Then you progress onto opponent 2, 3, ..., 10. When you've finished your move with the 10th opponent, you return to the game with opponent 1. Let's call this process a round.

While you're progressing through opponents 2...10, opponent 1 has (num_opponents - 1) * my_move_time = (10 - 1) * 3 = 27 minutes > 5 minutes to think and make their next move. Therefore, when you come back to opponent 1 at the start of the next round, you can immediately make your move without waiting for them to finish up.

In general, as long as each opponent takes less than (num_opponents - 1) * my_move_time to make their move, you will spend no time waiting. Each round will take  num_opponents * my_move_time = 10 * 3 = 30 minutes. Since we're assuming each game consists of 40 moves and we complete 2 moves per game in each round (your move and the opponent's move), it will take 20 rounds to complete all the games. So, we have that all 10 games will take 30 minutes * 20 rounds = 600 minutes = 10 hours, which is a 270% speedup over the sequential approach.

Asynchronous Programming in Practice

The act of switching between opponents in simultaneous chess to create overlapping between independent actions is the idea behind asynchronous programming. In most computer applications, there is a mixture of processing and waiting tasks in execution: processing is the instructions to analyze and manipulate data, while waiting can be for a file input to be read in, for a server to send back a response for a request, etc. Switching the execution flow from a task that is currently waiting to a new one, we can process this new task while the old task is concurrently waiting, provided the new task isn't dependent on the results of the old one.

Another advantage of asynchronous programming is improved responsiveness. Say the first of an ordered list of independent tasks in your program is relatively heavy, or long-processing. If the program was executed sequentially, it would first have to spend a significant amount of time to finish the first task before moving to the faster, lighter tasks. In an asynchronous program, these tasks will finish executing before the heavy tasks, so even with a heavy-weight task at the top of the queue, users will have the results returned from the light-weight tasks in a timely manner, even if they are far behind in the task list. We will analyze the improvement in responsiveness achieved with asynchronous programming in later sections.

Asynchronous Programming in Python

In this section we will implement an asynchronous program in Python. First we will go over the general structure of an asynchronous program and its main elements. Then we will learn about specific APIs to facilitate asynchronous programming and, finally, we will try our hands with a starting problem of programming asynchronously.

Coroutines, Event Loops, and Futures

Event loops, coroutines, and futures are the essential elements of an asynchronous program.

  • The event loop handles the task-switching aspect, or execution flow, of the program. It keeps track of all the tasks that are to be run asynchronously and decides which of those should be executed at a given moment.
  • A coroutine is a special type of function that wraps around a specific task so that it can be executed asynchronously. A coroutine specifies where in the function the task switching event should take place — this is when the execution flow is returned from the function back to the event loop. Coroutines are typically created by the event loop and stored internally in a task queue.
  • A future is a placeholder for a result returned from a coroutine. As soon as the event loop initiates a coroutine, a corresponding future is created that stores the result of the coroutine, or an exception if one was thrown during the coroutine's execution.

An event loop, coroutines, and their corresponding futures are the core elements to an asynchronous program. First, the event loop is started and creates a task queue. A coroutine for the first task is then executed and its corresponding future is created. When a task switching event takes place inside this coroutine, the coroutine is suspended and another coroutine will be called. If this coroutine is a blocking function (e.g. input/output processing, sleeping, etc.), the execution flow is released back to the event loop, which will then execute the next item in the task queue.

This process will repeat for all items of the task queue. During this process, as a task finishes, it will be eliminated from the task queue, its coroutine will be terminated, and the corresponding future will register the returned result from that coroutine. The process will go on until all tasks in the queue are completely executed.

The following diagram further illustrates the general structure of an asynchronous program described above:

asynchronous-program-structure

Python API

Python 3.5 (PEP 492) added two new keywords to support asynchronous programming:   async And await. It also added a built-in module called asyncio.

The async keyword indicates to the interpreter that the function definition that follows should be flagged as a coroutine. For example:

async def my_coroutine(name):
    return f'hi: {name}'
 

As we discussed above, each coroutine has to specify when a task switching event will take place. The await expression is used to mark where execution flow will be released.

# await executes my_coroutine and blocks until the coroutine returns
result = await my_coroutine('hi')
print(result)

# this never actually executes the coroutine
result = my_coroutine('hi')
print(result)
 

The asyncio module provides high-level routines for creating event loops (among other things):

In [ ]:
import asyncio

# create an event loop (an instance of asyncio.AbstractEventLoop)
event_loop = asyncio.get_event_loop()

# populate the event loop's task queue
tasks = [event_loop.create_task(my_coroutine('foo')),
        event_loop.create_task(my_coroutine('bar'))]

# blocks until all future are complete
complete, pending = event_loop.run_until_complete(asyncio.wait(tasks))
print(complete)
# should be an empty set
print(pending)

Examples in Python

In this section, we will explore the API that we discussed above using specific examples. Through these examples you will be familiarized with the procedure of implementing an asynchronous program in Python and observe the advantages that asynchronous programming provides.

Asynchronous Countdowns

This example illustrates the overlapping of processing and waiting time of independent tasks. To do this, we analyze the countdown function below:

import time

def seq_countdown(name, delay):
    # padding space for formatting
    indents = (ord(name) - ord('a')) * '\t'
    
    n = 3
    while n:
        time.sleep(delay)
        
        # elapsed time from the beginning
        duration = time.perf_counter() - start
        print('-' * 40)
        # message of the form: duration letter = countdown
        print(f'{duration:.4f} \t{indents}{name} = {n}')
        
        n -= 1
 

The seq_countdown() function takes in a letter string ('a', 'b', 'c', etc.) and a delay time. It will then count down from three to one in seconds while printing out the time elapsed from the beginning of the program and the input letter string (with the current countdown number). Let us now see this function in action:

In [2]:
start = time.perf_counter()

seq_countdown('a', 1)
seq_countdown('b', 0.8)
seq_countdown('c', 0.5)

print('-' * 40)
print('Finished.')
 
----------------------------------------
1.0019 	a = 3
----------------------------------------
2.0077 	a = 2
----------------------------------------
3.0131 	a = 1
----------------------------------------
3.8188 		b = 3
----------------------------------------
4.6244 		b = 2
----------------------------------------
5.4301 		b = 1
----------------------------------------
5.9306 			c = 3
----------------------------------------
6.4339 			c = 2
----------------------------------------
6.9387 			c = 1
----------------------------------------
Finished.
 

Here we call seq_countdown() with different string letters and different delay times. This is a purely sequential, synchronous program since there is no overlap between processing and waiting time anywhere. Additionally, it took approximately 6.9 seconds to run the program, which is the sum of the counting down time of all three letters.

1 second x 3 (for "a") + 0.8 seconds x 3 (for "b") + 0.5 seconds x 3 (for "c") = 6.9 seconds

It would be beneficial to make this program asynchronous. For example, during the first second of the program, while waiting for the letter "a," we can switch tasks to move to other letters. In fact, we will implement this setup for all the letters inside the countdown function, turning the countdown function into a coroutine.

When all countdown tasks are coroutines in an asynchronous program, we should achieve better execution time and responsiveness for our program. Since all three tasks are processed independently, the countdown messages should be printed out of order (jumping between different letters), and the asynchronous program should only take about the same time to run that the largest task takes (i.e., 3 seconds for letter "a").

The first step is to make the countdown function into a coroutine and specify a point inside the function as a task switching event. We will add the keyword async in front of the function, and instead of the time.sleep() function, we will use asyncio.sleep() together with the await keyword.


import asyncio

async def async_countdown(name, delay):
    # padding space for formatting
    indents = (ord(name) - ord('a')) * '\t'
    
    n = 3
    while n:
        await asyncio.sleep(delay)
        
        # elapsed time from the beginning
        duration = time.perf_counter() - start
        print('-' * 40)
        # message of the form: duration letter = countdown
        print(f'{duration:.4f} \t{indents}{name} = {n}')
        
        n -= 1
 

We now need to initialize and manage an event loop. We'll create an empty event loop with the asyncio.get_event_loop()method, add all three of the countdown tasks into the task queue with create_task(), and finally, start running the event loop with run_until_complete():

loop = asyncio.get_event_loop() # creating the event loop
# adding tasks to the task queue
tasks = [
    loop.create_task(async_countdown('a', 1)),
    loop.create_task(async_countdown('b', 0.8)),
    loop.create_task(async_countdown('c', 0.5))
]

start = time.perf_counter()
# run the event loop until all tasks are complete
loop.run_until_complete(asyncio.wait(tasks))

print('-' * 40)
print('Done.')
 
----------------------------------------
0.5056 			c = 3
----------------------------------------
0.8048 		b = 3
----------------------------------------
1.0036 	a = 3
----------------------------------------
1.0066 			c = 2
----------------------------------------
1.5106 			c = 1
----------------------------------------
1.6057 		b = 2
----------------------------------------
2.0046 	a = 2
----------------------------------------
2.4113 		b = 1
----------------------------------------
3.0062 	a = 1
----------------------------------------
Done.
 

As expected, at the beginning of the program, instead of waiting for the whole first second to print out the first message "a = 3", the program switches to the next task in the task queue. In this case, it is waiting for 0.8 seconds for the letter "b". This process continues until 0.5 seconds have passed, and "c = 3" is printed out, and 0.3 seconds later (at time 0.8 seconds), "b = 3" is printed out. This all happens before "a = 3" is printed out.

This task switching property makes it more responsive. Instead of "hanging" for one second before the first message is printed, the program now only takes 0.5 seconds (the shortest waiting period) to print out its first message. As for execution time, it takes only three seconds total to execute the program instead of 6.9 seconds. 

Asynchronous Fibonacci

Let us try applying asynchronous programming to another problem. In this example, we will consider the task of calculating different numbers in the Fibonacci sequence. Our starting function will look like:

def seq_fib(n):
    # if n = 0, say 0, if n = 1, say 1
    if n in [0, 1]:
        print(f'{n}: {n}')
        print(f'Took {time.perf_counter() - start:.2f} seconds.')
    
    # sequentially calculating fib(n)
    a, b = 1, 2
    i = 1
    while i < n:
        a, b = b, a + b
        i += 1
    
    # printing the last 20 digits if the result is too large
    print(f'{n}: {a % (10 ** 20)})
    # printing the time elapsed from the beginning
    print(f'Took {time.perf_counter() - start:.2f} seconds.)
 

Now we will use this function to generate different numbers in the sequence:

start = time.perf_counter()

seq_fib(1000000)
seq_fib(1000)
seq_fib(20)
 
1000000: 42277359244926937501
Took 12.05 seconds.
1000: 91902245245323403501
Took 12.05 seconds.
20: 10946
Took 12.05 seconds.
 

This is a situation where the first task in the program is so large that the execution appears to be hanging while processing this task, while the other two tasks were lightweight enough that it took almost no time to execute them after the first task.

Now, we want to address this problem so that the last two tasks, seq_fib(1000) and seq_fib(20), finish before the first task, seq_fib(1000000). To do this, we need to convert the seq_fib() function into a coroutine with a specification about the task switching event inside the function.

Below we specify inside the while loop that if the iterator i is divisible by 50,000, the application should switch to the next task:

async def async_fib(n):
    # if n = 0, say 0, if n = 1, say 1
    if n in [0, 1]:
        print(f'{n}: {n}')
        print(f'Took {time.perf_counter() - start:.2f} seconds.')
    
    # sequentially calculating fib(n)
    a, b = 1, 2
    i = 1
    while i < n:
        a, b = b, a + b
        
        if i % 50000 == 0:
            await asyncio.sleep(0) # switches task every 50,000 iterations
        
        i += 1
    
    # printing the last 20 digits if the result is too large
    print(f'{n}: {a % (10 ** 20)}')
    # printing the time elapsed from the beginning
    print(f'Took {time.perf_counter() - start:.2f} seconds.')
 

With the coroutine implemented, we now need to set up the event loop and add the tasks to the queue:

loop = asyncio.get_event_loop() # creating the event loop
# adding tasks to the task queue
tasks = [
    loop.create_task(async_fib(1000000)),
    loop.create_task(async_fib(1000)),
    loop.create_task(async_fib(20))
]

start = time.perf_counter()
# run the event loop until all tasks are complete
loop.run_until_complete(asyncio.wait(tasks))

print('Done.')
 
1000: 91902245245323403501
Took 0.06 seconds.
20: 10946
Took 0.06 seconds.
1000000: 42277359244926937501
Took 12.08 seconds.
Done.
 

We see that our responsiveness problem has been addressed: after running the program, the more lightweight tasks were executed first, and the longest-running task returned last.

CPU-bound and Blocking Tasks

Asynchronous can provide better overall execution time as we saw in the counting down example, but when analyzing the execution time obtained from the two versions of our Fibonacci-finding program, the sequential version may have been faster. (On our computers, it was.) So does that mean asynchronous programming cannot really provide better speed for our programs?

It still can, but more steps are needed. In the async_fib() coroutine of that program, most instructions are CPU-bound tasks. This means the program needs to access the CPU to execute the code, and no waiting/processing time can therefore be overlapped. While task switching events still take place in our program to improve its responsiveness, no additional speed can be gained. In fact, since there is considerable overhead to implement and initialize the asynchronous elements, our asynchronous program might take even longer to execute than the original synchronous program. The same goes for blocking tasks, which prevent multiple tasks to be overlapped with each other.

This does not mean that if your program contains heavy CPU-bound or blocking tasks, asynchronous programming is out of the question. All execution in an asynchronous program occurs entirely in the same process of execution if not otherwise specified, and CPU-bound or blocking tasks can prevent instructions from overlapping each other. However, this is not the case if the tasks are distributed to separate processes— multiprocessing can help asynchronous programs with blocking instructions achieve better execution time.

The concurrent.futures Module

The concurrent.futures module is a high-level interface for executing programs both asynchronously and in parallel. Specifically, concurrent.futures provides a multiprocessing executor class called ProcessPoolExecutor to facilitate multiprocessing.

Before we jump into the concurrent.futures API, let's discuss how the basics of asynchronous multiprocessing work with the framework that asyncio provides. As a reminder, we have three major elements in our asynchronous programming structure: the event loop, coroutines, and their corresponding futures. We still need the event loop in multiprocessing programs to coordinate tasks and handle their results.

The combination of asynchronous programming and multiprocessing involves avoiding blocking tasks in the coroutines by executing them in separate processes. That means the coroutines do not necessarily have to be interpreted as actual coroutines by Python anymore. Instead, they can simply be traditional Python functions.

One new element we will need to implement is the executor that facilitates multiprocessing — an instance of the ProcessPoolExecutor class. Every time we add a task to our queue in the event loop, we also need to reference this executor so that separate tasks will be executed in separated processes. This is done through the AbstractEventLoop.run_in_executor() method, which takes in an executor, a coroutine (or Python function), and arguments for the coroutines to be executed in separate processes.

Let us try applying this solution to our Fibonacci-finding program:

from concurrent.futures import ProcessPoolExecutor

async def main():
    tasks = [
        loop.run_in_executor(executor, seq_fib, 1000000),
        loop.run_in_executor(executor, seq_fib, 1000),
        loop.run_in_executor(executor, seq_fib, 20)
    ]
    
    await asyncio.gather(*tasks)

start = time.perf_counter()

executor = ProcessPoolExecutor(max_workers=3)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
 
1000: 91902245245323403501
20: 10946
Took 0.04 seconds.
Took 0.04 seconds.
1000000: 42277359244926937501
Took 12.54 seconds.
 

By combining multiprocessing with asynchronous programming, we get the best of both worlds: the consistent responsiveness from asynchronous programming, and the improvement in speed from multiprocessing.

Asynchronous Web Scraping

One of the most common applications of asynchronous programming is data collection via web scraping. This is the process of automating HTTP requests to various websites and extracting information from HTML source code. During this process, some servers will take longer to process our requests than others. Using asynchronous programming, as we can overlap the time spent waiting on the servers and the time we process the responses already returned.

The following diagram further illustrates the concept of asynchronous HTTP server-client communication processes:

This section will also introduce another module supporting asynchronous programming: aiohttp (which stands for Asynchronous I/O HTTP). This module provides high-level functionalities that streamline HTTP communication procedures, and it works well with the asyncio module to facilitate asynchronous programming.

Fetching a Website's HTML Code

Let us first look at how to make a request and obtain the HTML source code from a single website with aiohttp. The general structure of the program is as follows:

import aiohttp

async def get_html(session, url):
    async with session.get(url, ssl=False) as res:
        return await res.text()

async def main():
    async with aiohttp.ClientSession() as session:
        html = await get_html(session, 'http://datascience.com')
        print(html[: 1000]) # first 1000 characters

loop = asyncio.get_event_loop()
loop.run_until_complete(main())
 
<!doctype html><!-- start coded_template: id:6097576654 path:generated_layouts/6097576648.html --><!--[if lt IE 7]> <html class="no-js lt-ie9 lt-ie8 lt-ie7" lang="en" > <![endif]--><!--[if IE 7]>    <html class="no-js lt-ie9 lt-ie8" lang="en" >        <![endif]--><!--[if IE 8]>    <html class="no-js lt-ie9" lang="en" >               <![endif]--><!--[if gt IE 8]><!--><html class="no-js" lang="en"><!--<![endif]--><head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
    <meta name="author" content="Oracle + DataScience.com">
    <meta name="description" content="Oracle's DataScience.com offers a powerful enterprise data science platform that enables data science teams to organize work, access data and computing resources, and build, train, deploy, and manage models in the Oracle Cloud. Learn more about Oracle's Data Science Cloud Service.">
    <meta name="generator" content="HubSpot">
    <title>DataScience.com | Enterprise Data Science Pla
 

In the main() coroutine, we initiate an instance from the aiohttp.ClientSession class within a context manager. Note that we are also placing the async keyword in front of this declaration, since the whole context block itself will also be treated as a coroutine. Inside this block, we call and wait for the get_html() coroutine to process and return.

It takes in a session object and a url for the website we want to extract HTML source code from. Inside this function, we make another context manager asynchronous, which is used to make a GET request and store the response from the server to the variable res. Finally, we return the HTML source code stored in the response. Since the response is an object returned from the aiohttp.ClientSession class, its methods are asynchronous functions and we need to specify the keyword await when we call the text() function.

Finally, we print out the first 1,000 characters of the HTML code returned from the website we specified. In this case, it is the homepage of DataScience.com.

Writing Files Asynchronously

Most of the time, we would like to collect data by making requests to multiple websites. Simply printing out the response HTML code is inappropriate for many reasons. Instead, we will write the returned HTML code to output files. In essence, this process is asynchronous downloading, which is actually implemented in the underlying architecture of popular download managers. To do this, we will use the aiofiles module, which facilitates the asynchronous file writing processes, in combination with aiohttp and asyncio.

import aiofiles
import os

async def download_html(session, url):
    async with session.get(url, ssl=False) as res:
        filename = f'output_{os.path.basename(url)}.html'
        
        async with aiofiles.open(filename, 'wb') as f:
            while True:
                chunk = await res.content.read(1024)
                if not chunk:
                    break
                await f.write(chunk)
            
        return await res.release()
 

To facilitate asynchronous file writing, we use the asynchronous open() function from aiofiles to read in a file in a context manager. We also read the returned HTML in chunks asynchronously using the read() function for the content attribute of the response object. This means that after reading 1024 bytes of the current response, the execution flow will be released back to the event loop and the task switching event will take place.

Now we can implement the main part of our program:

async def main(url):
    async with aiohttp.ClientSession() as session:
        await download_html(session, url)
        
urls = [...] # a big list of websites to scrape

loop = asyncio.get_event_loop()
loop.run_until_complete(
    asyncio.gather(*(main(url) for url in urls))
)
 

The main() coroutine takes in a URL and passes it to the download_html() coroutine, together with an aiohttp.ClientSession instance. In our main program, we create an event loop and pass each item in a specified list of URLs to the main() coroutine.

Try this program with your favorite websites to see the improvements asynchronous programming can help you achieve. As a note regarding the legality of web scraping, it is important to know and understand usage-of-data policies of websites that you want to scrape from. Even though cooperative web scraping is generally acceptable, some websites do not allow automated engines to collect their data. Scraping data from these websites is therefore illegal.

In this tutorial, we have learned the basic idea of asynchronous programming and the main elements of any asynchronous program. We have seen the process of implementing asynchronous programs in Python through various examples, designed a simple asynchronous web scraping engine, and explored some advantages of asynchronous programming compared to traditional sequential programming.

Further Readings

Python Documentation. Coroutines and Tasks. docs.python.org/3/library/asyncio-task

Python Documentation. asyncio — Asynchronous I/O. docs.python.org/3/library/asyncio

Yeray Diaz. AsyncIO for the Working Python Developer. hackernoon.com

 
Quan Nguyen
Author
Quan Nguyen

Quan Nguyen is a data scientist and a machine learning enthusiast. He is the author of Mastering Concurrency in Python, a prime contributor for Python for Engineers and Scientists, as well as a content writer for the Python Software Foundation. Quan also enjoys incorporating mathematics and statistics into programming and technological automation.

Kate Silverstein headshot
Technical Reviewer
Kate Silverstein

Kate Silverstein is a research engineer at Oracle Labs in the Machine Learning Research Group. She specializes in natural language processing and applications of knowledge representation, including entity recognition, linking, and automatic knowledge base construction. She has a M.S. in Computer Science from University of Massachusetts, Amherst.