Remote Execution Agents
Airflow 3
This feature is only available for Airflow 3.x Deployments.Overview
Remote Execution relies on Agents in your environment, or the execution plane, to communicate with the API server in the Astro orchestration plane. The Agent heartbeats with its capabilities and the queue it is listening on, and the API server will respond back to assign work accordingly. Worker Agents run synchronous tasks, Triggerer Agents run asynchronous tasks/deferrable operators and the Dag processor Agent processes dags and sends up serialized representations.
You can register multiple remote queues for your Remote Execution Agents for similar reasons you would register multiple worker queues in Hosted execution mode.
https://<clusterId>.external.astronomer.run/
to your organization’s network allowlist so the Remote Execution Agents in your environment are able to heartbeat to the API server in the Astro orchestration plane.Register Remote Execution Agent with a Deployment
Prerequisites
- Kubernetes 1.30+
- Helm 3+
- A valid Astro account with permissions to create an Astro Agent
- An Agent Token from your Astro Deployment
- An API Deployment Token with Deployment Admin scoped permissions to pull the base Astro Remote Execution Agent Image
Step 1: Create Agent Token
- In the Astro UI, select a Workspace, click Deployments, and then select the Remote Execution Deployment that you want to register the Remote Execution Agent with.
- Select the Remote Agents tab.
- Use the toggle to switch to the Tokens view.
- Click +Agent Token.
- Define a Name, Expiration and optionally a Description for the Agent Token.
- Copy the Agent Token. You will not be shown the Agent Token value again.
Step 2: Install the Helm chart
Org Owner
role or a Deployment API token with the Deployment Admin
role to authenticate with Astronomer’s private image registry.- Use the toggle to switch back to the Agents view, and click Register a Remote Agent
- Click the Download button on the modal to download the
values.yaml
file. Use this Helm chart to configure the Remote Execution Agent. - Most config options in the
values.yaml
do not need to be updated to register and activate the Remote Agent. The following values in the Remote Agent’s Helm chart need to be updated:resourceNamePrefix
,namespace
,secretBackend
,xcomBackend
,imagePullSecretName
, andagentToken
oragentTokenSecretName
. The descriptions of each value is provided in the Helm chart itself. - If self-hosting the image, log in to the image registry with your token:
After you log in, you can pull the Remote Execution Agent image directly. To find the latest version and image path, refer to the Remote Execution Agent release notes, which include all currently hosted image tags and their full URLs.
- Install the Helm chart. The following code also adds the Helm repo and applies any updates:
Step 3: Set allowed IP address ranges
Setting allowed IP address ranges for a Remote Execution Deployment is not necessary but allows you to limit the Deployment’s incoming traffic to the Remote Agents in your environment.
- In the Set Allowed IP Address Ranges step on the modal, click Edit this Deployment.
- Click +Add IP.
- Add the IP address range, then click Add.
- Repeat to allowlist more IP address ranges.
Step 4: Check for Agent heartbeat
Close the modal to view the Agent list page and check for a healthy agent.
You should see at least one agent with a health status of “Healthy” and a value for “last heartbeat” no more than one minute prior. If your Remote Agent is healthy, configure dagBundleConfigList
in values.yaml
and do a Helm upgrade. You can now run Dags on this Agent.
Remote Executon Agents Helm chart configuration reference
The following Helm chart config values are required to run dags with your Remote Execution Agent. To see all available configuration options with descriptions, see the values.yaml
file downloaded in step 2.
agentToken or agentTokenSecretName
You must specify either agentToken
or agentTokenSecretName
to inject the Agent token generated in Step 1 into the Helm chart.
-
For
agentToken
, pass the Agent token value copied in Step 1 into the Helm chart as a string, and Helm will create a secret with the nameagent-token-secret
in the namespace. -
For
agentTokenSecretName
, pass the name of the secret containing the Agent token to connect the Agent to the Astro Deployment. This field should be used when the secret has already been created in the namespace through your own external mechanism. The secret must contain a key namedtoken
with the value set to the Agent Token created in Step 1.
imagePullSecretName or imagePullSecretData
You must specify either imagePullSecretName
or imagePullSecretData
to allow the Remote Execution Agent to pull container images from the registry. This can use your Astro Deployment API Token from your pre-reqs.
- For
imagePullSecretName
, provide the name of an existing Kubernetes secret in the namespace. This secret must contain a key named.dockerconfigjson
with your image pull credentials. Use this option if the secret has already been created. Example Kubernetes command to create the secret:
- For
imagePullSecretData
, provide the Docker config JSON as a string. The Helm chart will use this value to create a secret namedimage-pull-secret
in the namespace. The value must follow the standard Docker config format:
namespace
Specifies the Kubernetes namespace where the Astro Remote Agent will be deployed.
If createNamespace
is set to true
, the Helm chart will create the namespace with the name provided in this field.
If createNamespace
is set to false
, the namespace must be created manually before deploying the chart, and this field should reference the existing namespace.
See Install Remote Execution Agents in a restricted kubernetes namespace for steps to configure an agent in a kubernetes namespace.
agentTokenSecretName
and imagePullSecretName
, createNamespace
must be set to false
and the namespace must be created manually with those secrets already present.resourceNamePrefix
Specifies a name prefix for all the Kubernetes resources related to Remote Execution Agent components that would be deployed by this chart such as Deployments, ConfigMaps and Secrets.
secretBackend
The Airflow secret backend class to use for the Agent. See Secrets backend integration for Remote Execution for external secrets providers configuration instructions.
It is not recommended for production use cases, but you can also use Airflow’s local filesystem backend as a simpler alternative to an external secrets provider by setting:
xcomBackend
The Airflow XCom backend class to use for the Agent.
See Configure XCOM backend for a Remote Execution Agent for how to set up the XCom Backend in the Remote Execution Agent components.
Failures
When the heartbeat between the API server and a Remote Execution Agent is disrupted, the Astro executor prevents task duplication by marking queued tasks from that Agent as failed
. This makes tasks eligible for reassignment to healthy Agents. To ensure safe task execution, an Agent must receive explicit confirmation from the API server before starting any task. If an Agent loses connectivity with the API server, the Agent continues executing any tasks that the API server already confirmed and marked as running
, but the Agent will not start new tasks until heartbeat communication is restored.
Agent failure
The API Server marks an Agent as failed if the API Server misses three consecutive heartbeat intervals. When that happens, the API Server checks whether the Agent has any “queued” tasks, or tasks the Agent already picked up and started running, but has not yet reported as complete. If a worker Agent fails, the API Server marks those tasks as failed and makes them available for reassignment. If a triggerer Agent fails, the API Server immediately reassigns the tasks, since triggerer tasks are short-lived and idempotent.
API Server failure
If an Agent’s heartbeats can’t reach the API Server, the Agent assumes that the API Server and other Agents remain healthy. In this case:
- A worker continues running any tasks that the API Server already marked as
running
, but the worker doesn’t start new tasks until it reconnects with the API Server. This prevents two Agents from running the same task. - A triggerer stops processing tasks entirely until it restores connectivity. Since triggerer workloads are designed to be reassigned immediately when disconnected, trigger execution stops during the partition.
This behavior preserves task safety and prevents duplication for both workers and triggerers, even during partial failures or network partitions.
Manage Remote Execution Agents
On the Remote Agents tab of the Deployment, click More options for a Remote Execution Agent on the list to access an action menu for that Agent.
You can take the following actions on your registered Remote Execution Agents:
- Cordon: Cordoning a Remote Execution Agent marks it as unavailable for scheduling new tasks, while allowing it to continue running and complete any tasks already in progress.
This allows you to gracefully remove the Agent from service without interrupting current workloads. For example, you can cordon an Agent to delete or perform maintenance, such as an upgrade, on the Agent or underlying infrastructure.
A cordoned Agent will not receive new work, but it remains active until all running tasks have finished. Once ready to reintroduce the Agent to the task pool, it can be uncordoned to resume normal operation.
-
Uncordon: Uncordoning a Remote Execution Agent re-enables it to receive new tasks and resume normal scheduling.
-
Delete: Deletes the Remote Execution Agent from the Deployment.
Remote Execution Agent maintenance policy
Each Remote Execution Agent minor version is maintained for 6 months from the release month.
See Maintenance policy for more details about versioning, support, and upgrade recommendations.