A hook is an abstraction of a specific API that allows Airflow to interact with an external system. Hooks are built into many operators, but they can also be used directly in DAG code.
In this guide, you’ll learn about using hooks in Airflow and when you should use them directly in DAG code. You’ll also implement two different hooks in a DAG.
Over 200 hooks are available in the Airflow Registry. If a hook isn’t available for your use case, you can write your own and share it with the community.
To get the most out of this guide, you should have an understanding of:
Hooks wrap around APIs and provide methods to interact with different external systems. Hooks standardize how Astronomer interacts with external systems and using them makes your DAG code cleaner, easier to read, and less prone to errors.
To use a hook, you typically only need a connection ID to connect with an external system. For more information about setting up connections, see Manage your connections in Apache Airflow.
All hooks inherit from the BaseHook class, which contains the logic to set up an external connection with a connection ID. On top of making the connection to an external system, individual hooks can contain additional methods to perform various actions within the external system. These methods might rely on different Python libraries for these interactions. For example, the S3Hook relies on the boto3 library to manage its Amazon S3 connection.
The S3Hook contains over 20 methods to interact with Amazon S3 buckets. The following are some of the methods that are included with S3Hook:
check_for_bucket: Checks if a bucket with a specific name exists.list_prefixes: Lists prefixes in a bucket according to specified parameters.list_keys: Lists keys in a bucket according to specified parameters.load_file: Loads a local file to Amazon S3.download_file: Downloads a file from the Amazon S3 location to the local file system.Since hooks are the building blocks of operators, their use in Airflow is often abstracted away from the DAG author. However, there are some cases when you should use hooks directly in a Python function in your DAG. The following are some general guidelines for using hooks in Airflow:
The following example shows how you can use the hooks (S3Hook and SlackHook) to retrieve values from files in an Amazon S3 bucket, run a check on them, post the result of the check on Slack, and then log the response of the Slack API.
For this use case, you’ll use hooks directly in your Python functions because none of the existing Amazon S3 operators can read data from multiple files within an Amazon S3 bucket. Also, none of the existing Slack operators can return the response of a Slack API call, which you might want to log for monitoring purposes.
The source code for the hooks used in this example can be found in the following locations:
Before running the example DAG, make sure you have the necessary Airflow providers installed. If you are using the Astro CLI, add the following packages to your requirements.txt file:
api.slack.com/apps.The following example DAG uses Airflow Decorators to define tasks and XCom to pass information between Amazon S3 and Slack. The name of the Amazon S3 bucket and the names of the files that the first task reads are stored as environment variables for security purposes.
The following example DAG completes the following steps:
S3Hook reads three specific keys from Amazon S3 with the read_key method and then returns a dictionary with the file contents converted to integers.call method posts the sum check results to a Slack channel and returns the response from the Slack API.