Python Functools Module

functools is part of the Python standard library which is defined as being a “module is for higher-order functions” and that “functions that act on or return other functions”. In this post, we’ll go through a number examples incorporating the most common use cases.

Reduce

The reduce method is one of the most commonly used functions within the functools module. The name “reduce” implies taking in an iterable (often a list), and “reducing” it to a single value. This is also thought of as an “accumulator”.

In Python 2, the reduce method was available with no import statement necessary. Python 3, on the other hand, requires it to be imported from the functools library:

from functools import reduce

Numbers

The reduce function takes a minimum of 2 arguments: a function and an iterable. The simplest example is to add up a list of numbers:

from functools import reduce

lyst = [1, 2, 3, 4]
total = reduce(lambda x, y: x + y, lyst)

assert total == 10

As you can see, this is much more concise than having to create a variable, setting an initial value, then using a for loop to keep track of the total. Notice the use of a lambda function. This is how anonymous functions are implemented in Python. Inside the lambda, we are simply adding the value to the running total. We can also use a named function:

# using a named function
def add_it(x, y):
    return x + y


total = reduce(add_it, lyst)
assert total == 10

A named function is generally preferred for more complex functions.

For any list that is created dynamically, it’s important to point out what happens if the list is empty. It will result in an error if no initial value is set:

# TypeError: reduce() of empty iterable with no initial value
total = reduce(lambda x, y: x + y, [])

One reason why this cannot work is that if there is nothing in the list, there is no way to tell what data type would be reduced.

We can eliminate this error by setting an initial value which is an optional, third argument of the reduce function:

total = reduce(lambda x, y: x + y, [], 0)
assert total == 0

Although Python has built-in functions for min and max, we can perform the same task using reduce by modifying the return value using an if condition inside our lambda function.

max_value = reduce(lambda x, y: x if x > y else y, lyst)
assert max_value == max(lyst)

Strings

Iterating through strings can be done through both lists of string values and stings themselves. In the following example, the list of characters will be concatenated into a single string value:

# iterate through a list of characters
lyst = ["a", "b", "c", "d", "e", "f"]

result = reduce(lambda x, y: x + y, lyst)
print(f"{result=}")

assert result == "abcdef"

We can also get the same result from a string itself. Since a string can be an iterable in Python, we will get the same value back by concatenating the characters:

# iterate through a single string
lyst = "abcdef"

result = reduce(lambda x, y: x + y, lyst)
print(f"{result=}")

assert result == "abcdef"

Dictionaries and Other Complex Types

reduce can be used for pretty much anything that’s an iterable. In the next example, we will use a dictionary list where each item is a product for a specific order. We will attempt to add up the amounts of each order. Similar to the example of the empty list that caused an error, the same thing will happen here if we don’t include an initial value:

order_items = [
    {"item_id": 1, "name": "hiking boots", "amount": 150.00, "qty": 1},
    {"item_id": 33, "name": "hiking poles", "amount": 95.00, "qty": 1},
    {"item_id": 1, "name": "hydration pack", "amount": 52.00, "qty": 1},
    {"item_id": 1, "name": "hiking shorts", "amount": 60.00, "qty": 2},
]

# missing initial value will cause an error
total_amount = reduce(lambda x, y: x + y["amount"], order_items)
# TypeError: unsupported operand type(s) for +: 'dict' and 'float'

Without an initial value, the reduce method will try to add the amount to a dictionary. We can fix this by setting an initial value of 0 since we know we’re adding up a numeric value:

total_amount = reduce(lambda x, y: x + y["amount"], order_items, 0)
print(f"{total_amount=}")
assert total_amount == 357.0

Cache Decorators

functools provides a few decorators for caching. The lru_cache method uses the Least Recently Used (LRU) algorithm. This means that the cache will generally have a size limit and, once filled, the least recently used items will be discarded first.

The most commonly example cited for using lru_cache is a Fibonacci function. This function requires a lot of recursive function calls which makes maintaining a cache perform better. It also becomes obvious very quickly that there’s a performance drag once any sizeable number is passed in.

Our syntax for using the decorator will be @lru_cache(maxsize=128, typed=False). Both the maxsize of 128 and the typed of False are default values which we can leave out. If we pass in None as the maxsize, the cache can grow to any size. It’s also worth noting that the typed parameter, if set to True, will also check the datatype of the parameter in addition to its value. The documentation states that 3.0 (a float) and 3 (an integer) will be treated as distinct calls with distinct results, thus expanding the cache.

In the code snippet below, we import lru_cache and create our function. Adding the decorator is very straightforward. We simply place it just above the function.

from functools import lru_cache


@lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)


if __name__ == "__main__":
    res = fibonacci(40)
    print(res)  # 102334155

Executing the code snippet above will return the result very quickly. This is because it’s saving the result of each function call that includes the same parameter value passed into it. If the decorator is omitted/commented out, the performance will be noticeable with a seemingly modest value of 40. Even a fast machine will take over several seconds to get the result.

The wrapper created by the decorator also gives us a couple methods related to caching. The cache_info will provide us with caching statistics and the cache_clear method will reset the cache. The following example implements both methods:

...
if __name__ == "__main__":
    res = fibonacci(40)
    print(res)  # 102334155
    print(
        fibonacci.cache_info()
    )  # CacheInfo(hits=38, misses=41, maxsize=128, currsize=41)
    fibonacci.cache_clear()  # clears cache

The CacheInfo object that’s returned shows that the cache was used 38 times. The 41 “misses” is how many times our function actually ran. Since 41 is below our limit of 128, the current size matches the number of misses.

Finally, the cache_clear method will reset the cache and zero out all the data in the CacheInfo object.

Shorthand for LRU Cache

functools provides a cache decorator that’s a shorthand for lru_cache. It is equivalent to @lru_cache(maxsize=None) which means the size of the cache will be unbounded. Using our Fibonacci example, the implementation will be the following:

from functools import cache


@cache  # same as @lru_cache(maxsize=None)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)


if __name__ == "__main__":
    res = fibonacci(40)
    print(res)  # 102334155
    print(
        fibonacci.cache_info()
    )  # CacheInfo(hits=38, misses=41, maxsize=None, currsize=41)

Caching Class Properties

The final caching decorator provided by functools is cached_property which caches the property of a class. Consider the following example:

class Customer:
    def __init__(self, orders) -> None:
        self._orders = orders

    @property
    def recent_orders(self):
        orders = sorted(self._orders, key=lambda x: x["order_id"], reverse=True)
        return orders[:3]


if __name__ == "__main__":
    orders = [
        {"order_id": 1, "name": "hiking boots", "amount": 150.00, "qty": 1},
        {"order_id": 2, "name": "hiking poles", "amount": 95.00, "qty": 1},
        {"order_id": 3, "name": "hydration pack", "amount": 52.00, "qty": 1},
        {"order_id": 4, "name": "hiking shorts", "amount": 60.00, "qty": 2},
        {"order_id": 5, "name": "hiking shorts", "amount": 60.00, "qty": 2},
        {"order_id": 6, "name": "hiking shorts", "amount": 60.00, "qty": 2},
        {"order_id": 7, "name": "hiking shorts", "amount": 60.00, "qty": 2},
    ]
    customer = Customer(orders)
    print(customer.recent_orders)
    print(customer.recent_orders)

This Customer class takes in a list of orders when initialized. Getting the “recent orders” returns the last 3 orders. This simplified example assumes the orders with the highest order_id are the most recent. Using the sorted function, we sort the orders by order_id in descending order. Then, we return the first 3 records by index values.

At the bottom, we call the recent_orders property twice. In a realistic application, the recent_orders property could be used in multiple locations and a SQL query might be involved in getting the order data. In that case, we would want to cache the result instead of running unnecessary queries.

We can convert recent_orders to a cached property by importing cached_property from functools and replacing our property decorator:

from functools import cached_property


class Customer:
    def __init__(self, orders) -> None:
        self._orders = orders

    @cached_property
    def recent_orders(self):
        orders = sorted(self._orders, key=lambda x: x["order_id"], reverse=True)
        return orders[:3]


if __name__ == "__main__":
    orders = [
        {"order_id": 1, "name": "hiking boots", "amount": 150.00, "qty": 1},
        {"order_id": 2, "name": "hiking poles", "amount": 95.00, "qty": 1},
        {"order_id": 3, "name": "hydration pack", "amount": 52.00, "qty": 1},
        {"order_id": 4, "name": "hiking shorts", "amount": 60.00, "qty": 2},
        {"order_id": 5, "name": "hiking shorts", "amount": 60.00, "qty": 2},
        {"order_id": 6, "name": "hiking shorts", "amount": 60.00, "qty": 2},
        {"order_id": 7, "name": "hiking shorts", "amount": 60.00, "qty": 2},
    ]
    customer = Customer(orders)
    print(customer.recent_orders)
    print(customer.recent_orders)  # from cache

Wrapped Functions

There are a number of uses for wrapping functions. This allows executing code automatically both before and after a certain function is called (pre and post-processing). Use cases can range from handling retries to isolating your code from third party API’s where only the wrapper requires updating, leaving the rest of the code untouched.

The wraps decorator offers an important benefit: maintaining the original function’s signature, name, and other attributes. That is, without using wraps, a function being wrapped will maintain the signature, name, and docstring of the wrapper, not the function itself. We’ll start with a simple example, then move on to a real world example.

First, we into import wraps from functools and create the wrapper:

from functools import wraps


def my_decorator(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        print("before executing function")
        func(*args, **kwargs)
        print("after executing function")

    return wrapper

In the snippet above, we’ll use print statements in place of any pre and post-processing code. The my_decorator takes a function as an argument which is then passed into the wraps decorator function. The wrapper function itself takes in any arguments and keyword arguments setup in the target function, which we’ll add in the next step:

from functools import wraps


def my_decorator(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        print("before executing function")
        func(*args, **kwargs)
        print("after executing function")

    return wrapper


@my_decorator
def say_hello(name):
    """Say Hello

    Parameters:
    name (string)
    """
    print(f"hello, {name}")

We can now call our function by passing in a name. We can also print the double-under properties of the target function:

...

# prints:
# before executing function
# hello, Joe Blow
# after executing function
say_hello("Joe Blow")

print(say_hello.__name__)  # say_hello

# prints:
# Say Hello

#     Parameters:
#     name (string)
print(say_hello.__doc__)

As you can see, this implementation will maintain the same metadata as the original function. If we were, however, to comment out the line that reads @wraps(func), print(say_hello.__name__) would return wrapper, the name of the wrapper function. And, print(say_hello.__doc__) would print None because the wrapper function doesn’t have its own docstring.

A More Realistic Example

The following example will entail using the requests library to handle reties for failed API calls. External API requests can fail for a variety of reasons ranging from network issues to API’s being out of service. If you do not have requests installed, you can install it using pip: pip install requests.

To simulate our requests, we will use a website called httpbin. We can simulate any request method and get back any response code (i.e. 200, 400, etc.). The response code will be used to determine whether or not to retry a request. We will start with a simple request using the requests library, then make the necessary adjustments using a wrapper.

import requests

httpbin_url = "https://httpbin.org/get"

response = requests.get(httpbin_url, params={"test": "true"})
result = response.json()
print("result", result)

This is a typical HTTP request implementation. By running this, you should see a JSON response that includes the params that were originally sent plus some other metadata.

Now, let’s say this request fails periodically and we want to implement retries. We can create a retry_requests function that wraps any requests method (i.e. GET, POST, etc.). We can also include default arguments for the max number of reties and the delay in seconds. We will be using the sleep method from the time module to implement the delay.

We will create a stub function along with how our fetch_data function will be used. The retry_requests function will not know which requests method will be used, so it must be flexible in that regard. We also have 2 sets of parameters:

max_retries and delay
url and params

import requests
import time
from functools import wraps

def retry_requests(max_retries=3, delay=3):
    # TODO


@retry_requests(max_retries=2, delay=3)
def fetch_data(url, params=None):
    return requests.get(url, params)


httpbin_url = "https://httpbin.org/get"

response = fetch_data(httpbin_url, params={"test": "true"})
if response:
    result = response.json()
    print("result", result)
else:
    print("No reponse received")

Next, we will setup the needed nested functions:

...

def retry_requests(max_retries=3, delay=3):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):


        return wrapper
    return decorator
...

In the snippet above, we have 2 nested functions that enable us to maintain both our retries params and the params for the request itself. If we made this less flexible, we might be able to get by with a single nested function. But this gives the flexibility to pass in both sets of parameters.

Next, we will implement a for loop to keep track of our attempts. We will check the status code returned by the API call via requests module to determine whether the call succeeded for failed. For logging purposes, we will print this information.

...
def retry_requests(max_retries=3, delay=3):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    response = func(*args, **kwargs)
                    print(
                        f"Request to {response.url} returned status code {response.status_code}"
                    )

                    if response.status_code == 200:
                        return response
                    elif response.status_code == 429:
                        print("Rate limited. Waiting and retrying.")
                        time.sleep(delay)
                    elif 500 <= response.status_code < 600:
                        print("Server error. Retrying after delay.")
                        time.sleep(delay)
                    else:
                        response.raise_for_status()
                except requests.RequestException as e:
                    if attempt < max_retries - 1:
                        print(
                            f"Request failed with error: {str(e)}. Retrying in {delay} seconds."
                        )
                        time.sleep(delay)
                    else:
                        print("Max retries reached. Request failed.")
                        raise
            return None
        return wrapper
    return decorator
...

Notice that the line that reads response = func(*args, **kwargs) executes the target function. Once we have the response, we then check for the status code. If a 200 code is received, we assume the request was successful and simply return the response. However, for any 429 (too many requests) or 500-level status code, we apply the delay and retry. Any response code that doesn’t meet any of our conditions (since we’re not explicitly anticipating them) will be handled in the exception block. If the maximum number of retries is reached, None will be returned.

To test this out, we will create a new httpbin url that will return a 429 error. This should run through every attempt and return an empty result:

...

httpbin_url = "https://httpbin.org/get"
httpbin_error_url = "https://httpbin.org/status/429"

# should return a successful result
response = fetch_data(httpbin_url, params={"test": "true"})
if response:
    result = response.json()
    print("result", result)
else:
    print("No reponse received")

response = fetch_data(httpbin_error_url, params={"test": "true"})
if response:
    result = response.json()
    print("result", result)
else:
    print("No reponse received")
# should print:
# Request to https://httpbin.org/status/429?test=true returned status code 429
# Rate limited. Waiting and retrying.
# Request to https://httpbin.org/status/429?test=true returned status code 429
# Rate limited. Waiting and retrying.
# No response received

Finally, we can add a docstring to the fetch_data function and the metadata should be that of the same function:

@retry_requests(max_retries=2, delay=3)
def fetch_data(url, params=None):
    """Say Hello

    Parameters:
    url (string)
    params (dict|None)
    """
    return requests.get(url, params)

print(fetch_data.__name__)  # fetch_data

print(fetch_data.__doc__)
# prints:
# Say Hello

#     Parameters:
#     url (string)
#     params (dict|None)

Partial Function

What we just implemented with the wraps function is actually a convenience function that applies the partial function.

In the following snippet from the functools library, note the use of WRAPPER_ASSIGNMENTS and WRAPPER_UPDATES constants which contain double-under values including __name__ and __doc__. This is what enables wraps to maintain the original function metadata as described above.

def wraps(wrapped,
          assigned = WRAPPER_ASSIGNMENTS,
          updated = WRAPPER_UPDATES):
    """Decorator factory to apply update_wrapper() to a wrapper function

       Returns a decorator that invokes update_wrapper() with the decorated
       function as the wrapper argument and the arguments to wraps() as the
       remaining arguments. Default arguments are as for update_wrapper().
       This is a convenience function to simplify applying partial() to
       update_wrapper().
    """
    return partial(update_wrapper, wrapped=wrapped,
                   assigned=assigned, updated=updated)

The main objective for using partial is to create a new function from another while freezing one or more of the original function’s arguments. Consider the following example:

from functools import partial


def multiply(x, y):
    return x * y


double_it = partial(multiply, 2)

print(double_it)  # functools.partial(<function multiply at 0x10619cfe0>, 2)

print(double_it(5))  # 10

Above: we create a multiply function that takes in 2 parameters. Let’s say we want a doubling function that takes in any number and multiplies it by 2. We do this by using a partial, where “2” is always frozen as the x parameter. This simplifies our doubling case by having a function with fewer arguments.

Single Dispatch

The final function from the functools module that we’ll be covering is singledispatch. This allows running different function implementations based on the data type of the first argument. To better illustrate this, we will implement a JSON converter. Let’s first setup our initial function:

import json


def to_json(data):
    return json.dumps(data)


item_1 = {"test": True}
print(to_json(item_1))  # {"test": true}

This code above works ok as-is given the example. However, let’s try converting a set to JSON:

import json


def to_json(data):
    return json.dumps(data)


item_1 = {"test": True}
print(to_json(item_1))  # {"test": true}

item_2 = {"one", "two", "three"}
print(to_json(item_2))  # TypeError: Object of type set is not JSON serializable

Since there are no “sets” in JSON, we have to do something like convert the set to a list before serializing it. We can use the singledispatch function to handle this. We will add singledispatch as a decorator to the original function. This will become the default option. We can create subsequent functions using a decorator like: @to_json.register(type). Each of these can use an underscore in place of the function name because the decorator is used to reference the function. Any function that doesn’t match the data type passed into the register method will fall back to running the default function.

import json
from functools import singledispatch


@singledispatch
def to_json(data):
    return json.dumps(data)


@to_json.register(set)
def _(data):
    return json.dumps(list(data))

In the set implementation, we simply convert the set to a list before serializing it. Although, the same result could have been handled using an if condition within a single function, this approach is often used in libraries where the differences between each function can become very extensive.

Single Dispatch for Class Methods

This approach is also available for class methods by using the singledispatchmethod function. Similar, to singledispatch, each subsequent method will use a register method with a data type passed in as an argument. Consider the following class:

from functools import singledispatchmethod


class Math:
    def __init__(self, value) -> None:
        self.value = value

    @singledispatchmethod
    def add(self, item):
        return self.value + item

    @add.register(str)
    def _(self, item):
        return f"{self.value}, {item}"

    @add.register(list)
    def _(self, item):
        return [self.value] + item

We set an initial value with a new instance of the “Math” class. Then, an add method will run depending on the data type of the argument. In the default method, we use the + operator to add to the initial value. However, if the value is a string or a list, we will either use string interpolation to combine the values or return a list with the initial value prepended to it.

...
math = Math(7)
print(math.add(3))  # 10
print(math.add("lucky"))  # 7, lucky
print(math.add([9, 11]))  # [7, 9, 11]

Posted in python