Airflow XComs allow you to pass data between tasks. By default, Airflow uses the metadata database to store XComs, which works well for local development but has limited performance. If you configure a custom XCom backend, you can define where and how Airflow stores XComs, as well as customize serialization and deserialization methods.
In this guide you’ll learn:
Warning
While a custom XCom backend allows you to store virtually unlimited amounts of data as XComs, you will need to scale other Airflow components to pass large amounts of data between tasks. For help running Airflow at scale, reach out to Astronomer.
To get the most benefits from this guide, you need an understanding of:
Common reasons to use a custom XCom backend include:
You can also use custom XCom backends to define custom serialization and deserialization methods for XComs if you need to add a serialization method to a class, or if registering a custom serializer is not feasible. See Custom serialization and deserialization for more information.
There are two main ways to set up a custom XCom backend:
Additionally, some provider packages offer custom XCom backends that you can use out of the box. For example, the Snowpark provider contains a custom XCom backend for Snowflake.
You can create a custom XCom backend using object storage. The Object Storage XCom Backend is part of the Common IO provider and can be defined using the following environment variables:
AIRFLOW__CORE__XCOM_BACKEND: The XCom backend to use. Set this to airflow.providers.common.io.xcom.backend.XComObjectStorageBackend to use the Object Storage XCom Backend.AIRFLOW__COMMON_IO__XCOM_OBJECTSTORAGE_PATH: The path to the object storage where XComs are stored. The path should be in the format <your-scheme>://<your-connection-id@<your-bucket>/xcom. For example, s3://my-s3-connection@my-bucket/xcom. The most common schemes are s3, gs, and abfs for Amazon S3, Google Cloud Storage, and Azure Blob Storage, respectively.AIRFLOW__COMMON_IO__XCOM_OBJECTSTORAGE_THRESHOLD: The threshold in bytes for XComs to be stored in the object storage. All objects smaller or equal to this threshold are stored in the metadata database. All objects larger than this threshold are stored in the object storage. The default value is -1, meaning all XComs are stored in the metadata database.AIRFLOW__COMMON_IO__XCOM_OBJECTSTORAGE_COMPRESSION: Optional. The compression algorithm to use when storing XComs in the object storage, for example zip. The default value is None.For a step-by-step tutorial on how to set up a custom XCom backend using the Object Storage XCom Backend for Amazon S3, Google Cloud Storage and Azure Blob Storage, see the Set up a custom XCom backend using object storage.
To create a custom XCom backend, you need to define an XCom backend class which inherits from the BaseXCom class.
The code below shows an example MyCustomXComBackend class that only allows JSON-serializeable XComs and stores them in both, Amazon S3 and Google Cloud Storage using a custom serialize_value() method. The deserialize_value() method retrieves the XComs from the Amazon S3 bucket and returns the value.
The Airflow metadata database stores a reference string to the XCom, which is displayed in the XCom tab of the Airflow UI. The reference string is prefixed with s3_and_gs:// to indicate that the XCom is stored in both Amazon S3 and Google Cloud Storage. You can add any serialization and deserialization logic to the serialize_value() and deserialize_value() methods that you need, see Custom serialization and deserialization for more information.
To use a custom XCom backend class, you need to save it in a Python file in the include directory of your Airflow project. Then, set the AIRFLOW__CORE__XCOM_BACKEND environment variable in your Airflow instance to the path of the custom XCom backend class. If you run Airflow locally with the Astro CLI, you can set the environment variable in the .env file of your Astro project. On Astro, you can set the environment variable in the Astro UI.
If you want to further customize the functionality for your custom XCom backend, you can override additional methods of the XCom module (source code).
By default, Airflow includes serialization methods for common object types like JSON, pandas DataFrames and NumPy.
If you need to pass data objects through XCom that are not supported, you have several options:
serialize() and deserialize() method to the class of the object you want to pass through XCom, see Serialization.