Python Functools Module
functools
is part of the Python standard library which is defined as being a “module is for higher-order functions” and that “functions that act on or return other functions”. In this post, we’ll go through a number examples incorporating the most common use cases.
Reduce
The reduce
method is one of the most commonly used functions within the functools
module. The name “reduce” implies taking in an iterable (often a list), and “reducing” it to a single value. This is also thought of as an “accumulator”.
In Python 2, the reduce method was available with no import statement necessary. Python 3, on the other hand, requires it to be imported from the functools
library:
from functools import reduce
Numbers
The reduce
function takes a minimum of 2 arguments: a function and an iterable. The simplest example is to add up a list of numbers:
from functools import reduce
lyst = [1, 2, 3, 4]
total = reduce(lambda x, y: x + y, lyst)
assert total == 10
As you can see, this is much more concise than having to create a variable, setting an initial value, then using a for loop to keep track of the total. Notice the use of a lambda
function. This is how anonymous functions are implemented in Python. Inside the lambda
, we are simply adding the value to the running total. We can also use a named function:
# using a named function
def add_it(x, y):
return x + y
total = reduce(add_it, lyst)
assert total == 10
A named function is generally preferred for more complex functions.
For any list that is created dynamically, it’s important to point out what happens if the list is empty. It will result in an error if no initial value is set:
# TypeError: reduce() of empty iterable with no initial value
total = reduce(lambda x, y: x + y, [])
One reason why this cannot work is that if there is nothing in the list, there is no way to tell what data type would be reduced.
We can eliminate this error by setting an initial value which is an optional, third argument of the reduce
function:
total = reduce(lambda x, y: x + y, [], 0)
assert total == 0
Although Python has built-in functions for min
and max
, we can perform the same task using reduce
by modifying the return value using an if
condition inside our lambda
function.
max_value = reduce(lambda x, y: x if x > y else y, lyst)
assert max_value == max(lyst)
Strings
Iterating through strings can be done through both lists of string values and stings themselves. In the following example, the list of characters will be concatenated into a single string value:
# iterate through a list of characters
lyst = ["a", "b", "c", "d", "e", "f"]
result = reduce(lambda x, y: x + y, lyst)
print(f"{result=}")
assert result == "abcdef"
We can also get the same result from a string itself. Since a string can be an iterable in Python, we will get the same value back by concatenating the characters:
# iterate through a single string
lyst = "abcdef"
result = reduce(lambda x, y: x + y, lyst)
print(f"{result=}")
assert result == "abcdef"
Dictionaries and Other Complex Types
reduce
can be used for pretty much anything that’s an iterable. In the next example, we will use a dictionary list where each item is a product for a specific order. We will attempt to add up the amounts of each order. Similar to the example of the empty list that caused an error, the same thing will happen here if we don’t include an initial value:
order_items = [
{"item_id": 1, "name": "hiking boots", "amount": 150.00, "qty": 1},
{"item_id": 33, "name": "hiking poles", "amount": 95.00, "qty": 1},
{"item_id": 1, "name": "hydration pack", "amount": 52.00, "qty": 1},
{"item_id": 1, "name": "hiking shorts", "amount": 60.00, "qty": 2},
]
# missing initial value will cause an error
total_amount = reduce(lambda x, y: x + y["amount"], order_items)
# TypeError: unsupported operand type(s) for +: 'dict' and 'float'
Without an initial value, the reduce
method will try to add the amount to a dictionary. We can fix this by setting an initial value of 0 since we know we’re adding up a numeric value:
total_amount = reduce(lambda x, y: x + y["amount"], order_items, 0)
print(f"{total_amount=}")
assert total_amount == 357.0
Cache Decorators
functools
provides a few decorators for caching. The lru_cache
method uses the Least Recently Used (LRU) algorithm. This means that the cache will generally have a size limit and, once filled, the least recently used items will be discarded first.
The most commonly example cited for using lru_cache
is a Fibonacci function. This function requires a lot of recursive function calls which makes maintaining a cache perform better. It also becomes obvious very quickly that there’s a performance drag once any sizeable number is passed in.
Our syntax for using the decorator will be @lru_cache(maxsize=128, typed=False)
. Both the maxsize
of 128 and the typed
of False
are default values which we can leave out. If we pass in None
as the maxsize
, the cache can grow to any size. It’s also worth noting that the typed
parameter, if set to True
, will also check the datatype of the parameter in addition to its value. The documentation states that 3.0 (a float) and 3 (an integer) will be treated as distinct calls with distinct results, thus expanding the cache.
In the code snippet below, we import lru_cache
and create our function. Adding the decorator is very straightforward. We simply place it just above the function.
from functools import lru_cache
@lru_cache(maxsize=128)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
if __name__ == "__main__":
res = fibonacci(40)
print(res) # 102334155
Executing the code snippet above will return the result very quickly. This is because it’s saving the result of each function call that includes the same parameter value passed into it. If the decorator is omitted/commented out, the performance will be noticeable with a seemingly modest value of 40. Even a fast machine will take over several seconds to get the result.
The wrapper created by the decorator also gives us a couple methods related to caching. The cache_info
will provide us with caching statistics and the cache_clear
method will reset the cache. The following example implements both methods:
...
if __name__ == "__main__":
res = fibonacci(40)
print(res) # 102334155
print(
fibonacci.cache_info()
) # CacheInfo(hits=38, misses=41, maxsize=128, currsize=41)
fibonacci.cache_clear() # clears cache
The CacheInfo
object that’s returned shows that the cache was used 38 times. The 41 “misses” is how many times our function actually ran. Since 41 is below our limit of 128, the current size matches the number of misses.
Finally, the cache_clear
method will reset the cache and zero out all the data in the CacheInfo
object.
Shorthand for LRU Cache
functools
provides a cache
decorator that’s a shorthand for lru_cache
. It is equivalent to @lru_cache(maxsize=None)
which means the size of the cache will be unbounded. Using our Fibonacci example, the implementation will be the following:
from functools import cache
@cache # same as @lru_cache(maxsize=None)
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
if __name__ == "__main__":
res = fibonacci(40)
print(res) # 102334155
print(
fibonacci.cache_info()
) # CacheInfo(hits=38, misses=41, maxsize=None, currsize=41)
Caching Class Properties
The final caching decorator provided by functools
is cached_property
which caches the property of a class. Consider the following example:
class Customer:
def __init__(self, orders) -> None:
self._orders = orders
@property
def recent_orders(self):
orders = sorted(self._orders, key=lambda x: x["order_id"], reverse=True)
return orders[:3]
if __name__ == "__main__":
orders = [
{"order_id": 1, "name": "hiking boots", "amount": 150.00, "qty": 1},
{"order_id": 2, "name": "hiking poles", "amount": 95.00, "qty": 1},
{"order_id": 3, "name": "hydration pack", "amount": 52.00, "qty": 1},
{"order_id": 4, "name": "hiking shorts", "amount": 60.00, "qty": 2},
{"order_id": 5, "name": "hiking shorts", "amount": 60.00, "qty": 2},
{"order_id": 6, "name": "hiking shorts", "amount": 60.00, "qty": 2},
{"order_id": 7, "name": "hiking shorts", "amount": 60.00, "qty": 2},
]
customer = Customer(orders)
print(customer.recent_orders)
print(customer.recent_orders)
This Customer
class takes in a list of orders when initialized. Getting the “recent orders” returns the last 3 orders. This simplified example assumes the orders with the highest order_id
are the most recent. Using the sorted
function, we sort the orders by order_id
in descending order. Then, we return the first 3 records by index values.
At the bottom, we call the recent_orders
property twice. In a realistic application, the recent_orders
property could be used in multiple locations and a SQL query might be involved in getting the order data. In that case, we would want to cache the result instead of running unnecessary queries.
We can convert recent_orders
to a cached property by importing cached_property
from functools
and replacing our property
decorator:
from functools import cached_property
class Customer:
def __init__(self, orders) -> None:
self._orders = orders
@cached_property
def recent_orders(self):
orders = sorted(self._orders, key=lambda x: x["order_id"], reverse=True)
return orders[:3]
if __name__ == "__main__":
orders = [
{"order_id": 1, "name": "hiking boots", "amount": 150.00, "qty": 1},
{"order_id": 2, "name": "hiking poles", "amount": 95.00, "qty": 1},
{"order_id": 3, "name": "hydration pack", "amount": 52.00, "qty": 1},
{"order_id": 4, "name": "hiking shorts", "amount": 60.00, "qty": 2},
{"order_id": 5, "name": "hiking shorts", "amount": 60.00, "qty": 2},
{"order_id": 6, "name": "hiking shorts", "amount": 60.00, "qty": 2},
{"order_id": 7, "name": "hiking shorts", "amount": 60.00, "qty": 2},
]
customer = Customer(orders)
print(customer.recent_orders)
print(customer.recent_orders) # from cache
Wrapped Functions
There are a number of uses for wrapping functions. This allows executing code automatically both before and after a certain function is called (pre and post-processing). Use cases can range from handling retries to isolating your code from third party API’s where only the wrapper requires updating, leaving the rest of the code untouched.
The wraps
decorator offers an important benefit: maintaining the original function’s signature, name, and other attributes. That is, without using wraps
, a function being wrapped will maintain the signature, name, and docstring of the wrapper, not the function itself. We’ll start with a simple example, then move on to a real world example.
First, we into import wraps
from functools
and create the wrapper:
from functools import wraps
def my_decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
print("before executing function")
func(*args, **kwargs)
print("after executing function")
return wrapper
In the snippet above, we’ll use print
statements in place of any pre and post-processing code. The my_decorator
takes a function as an argument which is then passed into the wraps
decorator function. The wrapper
function itself takes in any arguments and keyword arguments setup in the target function, which we’ll add in the next step:
from functools import wraps
def my_decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
print("before executing function")
func(*args, **kwargs)
print("after executing function")
return wrapper
@my_decorator
def say_hello(name):
"""Say Hello
Parameters:
name (string)
"""
print(f"hello, {name}")
We can now call our function by passing in a name. We can also print the double-under properties of the target function:
...
# prints:
# before executing function
# hello, Joe Blow
# after executing function
say_hello("Joe Blow")
print(say_hello.__name__) # say_hello
# prints:
# Say Hello
# Parameters:
# name (string)
print(say_hello.__doc__)
As you can see, this implementation will maintain the same metadata as the original function. If we were, however, to comment out the line that reads @wraps(func)
, print(say_hello.__name__)
would return wrapper
, the name of the wrapper function. And, print(say_hello.__doc__)
would print None
because the wrapper function doesn’t have its own docstring.
A More Realistic Example
The following example will entail using the requests library to handle reties for failed API calls. External API requests can fail for a variety of reasons ranging from network issues to API’s being out of service. If you do not have requests
installed, you can install it using pip
: pip install requests
.
To simulate our requests, we will use a website called httpbin. We can simulate any request method and get back any response code (i.e. 200, 400, etc.). The response code will be used to determine whether or not to retry a request. We will start with a simple request using the requests
library, then make the necessary adjustments using a wrapper.
import requests
httpbin_url = "https://httpbin.org/get"
response = requests.get(httpbin_url, params={"test": "true"})
result = response.json()
print("result", result)
This is a typical HTTP request implementation. By running this, you should see a JSON response that includes the params that were originally sent plus some other metadata.
Now, let’s say this request fails periodically and we want to implement retries. We can create a retry_requests
function that wraps any requests
method (i.e. GET, POST, etc.). We can also include default arguments for the max number of reties and the delay in seconds. We will be using the sleep
method from the time
module to implement the delay.
We will create a stub function along with how our fetch_data
function will be used. The retry_requests
function will not know which requests
method will be used, so it must be flexible in that regard. We also have 2 sets of parameters:
max_retries
anddelay
url
andparams
import requests
import time
from functools import wraps
def retry_requests(max_retries=3, delay=3):
# TODO
@retry_requests(max_retries=2, delay=3)
def fetch_data(url, params=None):
return requests.get(url, params)
httpbin_url = "https://httpbin.org/get"
response = fetch_data(httpbin_url, params={"test": "true"})
if response:
result = response.json()
print("result", result)
else:
print("No reponse received")
Next, we will setup the needed nested functions:
...
def retry_requests(max_retries=3, delay=3):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
return wrapper
return decorator
...
In the snippet above, we have 2 nested functions that enable us to maintain both our retries params and the params for the request itself. If we made this less flexible, we might be able to get by with a single nested function. But this gives the flexibility to pass in both sets of parameters.
Next, we will implement a for loop to keep track of our attempts. We will check the status code returned by the API call via requests
module to determine whether the call succeeded for failed. For logging purposes, we will print
this information.
...
def retry_requests(max_retries=3, delay=3):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
response = func(*args, **kwargs)
print(
f"Request to {response.url} returned status code {response.status_code}"
)
if response.status_code == 200:
return response
elif response.status_code == 429:
print("Rate limited. Waiting and retrying.")
time.sleep(delay)
elif 500 <= response.status_code < 600:
print("Server error. Retrying after delay.")
time.sleep(delay)
else:
response.raise_for_status()
except requests.RequestException as e:
if attempt < max_retries - 1:
print(
f"Request failed with error: {str(e)}. Retrying in {delay} seconds."
)
time.sleep(delay)
else:
print("Max retries reached. Request failed.")
raise
return None
return wrapper
return decorator
...
Notice that the line that reads response = func(*args, **kwargs)
executes the target function. Once we have the response, we then check for the status code. If a 200 code is received, we assume the request was successful and simply return the response. However, for any 429 (too many requests) or 500-level status code, we apply the delay and retry. Any response code that doesn’t meet any of our conditions (since we’re not explicitly anticipating them) will be handled in the exception block. If the maximum number of retries is reached, None
will be returned.
To test this out, we will create a new httpbin url that will return a 429 error. This should run through every attempt and return an empty result:
...
httpbin_url = "https://httpbin.org/get"
httpbin_error_url = "https://httpbin.org/status/429"
# should return a successful result
response = fetch_data(httpbin_url, params={"test": "true"})
if response:
result = response.json()
print("result", result)
else:
print("No reponse received")
response = fetch_data(httpbin_error_url, params={"test": "true"})
if response:
result = response.json()
print("result", result)
else:
print("No reponse received")
# should print:
# Request to https://httpbin.org/status/429?test=true returned status code 429
# Rate limited. Waiting and retrying.
# Request to https://httpbin.org/status/429?test=true returned status code 429
# Rate limited. Waiting and retrying.
# No response received
Finally, we can add a docstring to the fetch_data
function and the metadata should be that of the same function:
@retry_requests(max_retries=2, delay=3)
def fetch_data(url, params=None):
"""Say Hello
Parameters:
url (string)
params (dict|None)
"""
return requests.get(url, params)
print(fetch_data.__name__) # fetch_data
print(fetch_data.__doc__)
# prints:
# Say Hello
# Parameters:
# url (string)
# params (dict|None)
Partial Function
What we just implemented with the wraps
function is actually a convenience function that applies the partial
function.
In the following snippet from the functools
library, note the use of WRAPPER_ASSIGNMENTS
and WRAPPER_UPDATES
constants which contain double-under values including __name__
and __doc__
. This is what enables wraps
to maintain the original function metadata as described above.
def wraps(wrapped,
assigned = WRAPPER_ASSIGNMENTS,
updated = WRAPPER_UPDATES):
"""Decorator factory to apply update_wrapper() to a wrapper function
Returns a decorator that invokes update_wrapper() with the decorated
function as the wrapper argument and the arguments to wraps() as the
remaining arguments. Default arguments are as for update_wrapper().
This is a convenience function to simplify applying partial() to
update_wrapper().
"""
return partial(update_wrapper, wrapped=wrapped,
assigned=assigned, updated=updated)
The main objective for using partial
is to create a new function from another while freezing one or more of the original function’s arguments. Consider the following example:
from functools import partial
def multiply(x, y):
return x * y
double_it = partial(multiply, 2)
print(double_it) # functools.partial(<function multiply at 0x10619cfe0>, 2)
print(double_it(5)) # 10
Above: we create a multiply
function that takes in 2 parameters. Let’s say we want a doubling function that takes in any number and multiplies it by 2. We do this by using a partial
, where “2” is always frozen as the x
parameter. This simplifies our doubling case by having a function with fewer arguments.
Single Dispatch
The final function from the functools
module that we’ll be covering is singledispatch
. This allows running different function implementations based on the data type of the first argument. To better illustrate this, we will implement a JSON converter. Let’s first setup our initial function:
import json
def to_json(data):
return json.dumps(data)
item_1 = {"test": True}
print(to_json(item_1)) # {"test": true}
This code above works ok as-is given the example. However, let’s try converting a set
to JSON:
import json
def to_json(data):
return json.dumps(data)
item_1 = {"test": True}
print(to_json(item_1)) # {"test": true}
item_2 = {"one", "two", "three"}
print(to_json(item_2)) # TypeError: Object of type set is not JSON serializable
Since there are no “sets” in JSON, we have to do something like convert the set
to a list
before serializing it. We can use the singledispatch
function to handle this. We will add singledispatch
as a decorator to the original function. This will become the default option. We can create subsequent functions using a decorator like: @to_json.register(type)
. Each of these can use an underscore in place of the function name because the decorator is used to reference the function. Any function that doesn’t match the data type passed into the register
method will fall back to running the default function.
import json
from functools import singledispatch
@singledispatch
def to_json(data):
return json.dumps(data)
@to_json.register(set)
def _(data):
return json.dumps(list(data))
In the set
implementation, we simply convert the set
to a list
before serializing it. Although, the same result could have been handled using an if
condition within a single function, this approach is often used in libraries where the differences between each function can become very extensive.
Single Dispatch for Class Methods
This approach is also available for class methods by using the singledispatchmethod
function. Similar, to singledispatch
, each subsequent method will use a register
method with a data type passed in as an argument. Consider the following class:
from functools import singledispatchmethod
class Math:
def __init__(self, value) -> None:
self.value = value
@singledispatchmethod
def add(self, item):
return self.value + item
@add.register(str)
def _(self, item):
return f"{self.value}, {item}"
@add.register(list)
def _(self, item):
return [self.value] + item
We set an initial value with a new instance of the “Math” class. Then, an add
method will run depending on the data type of the argument. In the default method, we use the +
operator to add to the initial value. However, if the value is a string
or a list
, we will either use string interpolation to combine the values or return a list with the initial value prepended to it.
...
math = Math(7)
print(math.add(3)) # 10
print(math.add("lucky")) # 7, lucky
print(math.add([9, 11])) # [7, 9, 11]
Posted in python