Remote Execution Agents | Astronomer Docs

This is feature is only available if you are on the Enterprise tier or above. See Astro Plans and Pricing.

Airflow 3

This feature is only available for Airflow 3.x Deployments.

Overview

Remote Execution relies on Agents in your environment, or the execution plane, to communicate with the API server in the Astro orchestration plane. The Agent heartbeats with its capabilities and the queue it is listening on, and the API server will respond back to assign work accordingly. Worker Agents run synchronous tasks, Triggerer Agents run asynchronous tasks/deferrable operators and the Dag processor Agent processes dags and sends up serialized representations.

You can register multiple remote queues for your Remote Execution Agents for similar reasons you would register multiple worker queues in Hosted execution mode.

Add https://<clusterId>.external.astronomer.run/ to your organization’s network allowlist so the Remote Execution Agents in your environment are able to heartbeat to the API server in the Astro orchestration plane.

Register Remote Execution Agent with a Deployment

Prerequisites

Kubernetes 1.30+
Helm 3+
A valid Astro account with permissions to create an Astro Agent
An Agent Token from your Astro Deployment
A Deployment API Token with Deployment Admin scoped permissions to pull the base Astro Remote Execution Agent Image

Step 1: Create Agent Token

Astro UI

Astro API

In the Astro UI, select a Workspace, click Deployments, and then select the Remote Execution Deployment that you want to register the Remote Execution Agent with.
Select the Remote Agents tab.
Use the toggle to switch to the Tokens view.
Click +Agent Token.
Define a Name, Expiration and optionally a Description for the Agent Token.
Copy the Agent Token and save it in a secure location as a file named token.txt. You will not be shown the Agent Token value again.

There is a default limit of 50 Agent Tokens per Deployment. Contact Astronomer support if you need a higher limit.

Step 2: Install the Helm chart

Astronomer recommends pulling both the Remote Execution Agent image and the Sentinel image and storing them in your private registry. Sentinel provides advanced monitoring and reporting for Remote Execution Agents, starting from version 1.2.0. The Agent base images are minimal, so you might need to add packages for your pipelines to function properly. Use either an Organization API token with the Org Owner role or a Deployment API token with the Deployment Admin role to authenticate.

Use the toggle to switch back to the Agents view, and click Register a Remote Agent
Click the Download button on the modal to download the values.yaml file. Use this Helm chart to configure the Remote Execution Agent.
Most config options in the values.yaml do not need to be updated to register and activate the Remote Agent. The Helm chart and the Remote Execution Agent Config Reference contain descriptions of each value. The following values in the Remote Agent’s Helm chart need to be updated:

resourceNamePrefix
namespace
secretBackend
xcomBackend
imagePullSecretName
agentToken, agentTokenSecretName, or agentTokenFile

If self-hosting the image, log in to the image registry with your Deployment API token:

docker login images.astronomer.cloud -u cli -p <your-token>

Sentinel image available with 1.2.0 and later

Starting with Remote Execution Agent 1.2.0, a Sentinel image is published alongside the agent images to provide enhanced monitoring for Remote Execution Agents. The Sentinel image must be pulled separately and is not deployed by default. To use Sentinel, explicitly enable and configure the service in your values.yaml file.

After you log in, you can pull the Remote Execution Agent and optionally the Sentinel image directly. To find the latest version and image path, refer to the Remote Execution Agent release notes for all currently hosted images and Remote Execution Agent image reference for their full URLs. For example:

$ docker pull images.astronomer.cloud/baseimages/astro-remote-execution-agent:3.1-3-python-3.12-astro-agent-1.2.0

$ docker pull images.astronomer.cloud/baseimages/astro-remote-execution-sentinel:1.2.0

Pull the Remote Execution Agent image, apply customizations that your dags require, and push it to your private registry. Then update the values.yaml file to reference your customized image.
Install the Helm chart. The following code also adds the Helm repo and applies any updates:

$ helm repo add astronomer https://helm.astronomer.io
> helm repo update
> helm install astro-agent astronomer/astro-remote-execution-agent -f values.yaml

Step 3: Set allowed IP address ranges

You can optionally set allowed IP address ranges to ensure that your Astro deployment will only accept traffic originating from allowed IPs. In addition to added security, this can also help in other ways, such as achieving network isolation between development and production environments.

In the Set Allowed IP Address Ranges step on the modal, click Edit this Deployment.
Click +Add IP.
Add the IP address range, then click Add.
Repeat to allowlist more IP address ranges.

Step 4: Check for Agent heartbeat

Close the modal to view the Agent list page and check for a healthy agent.

You should see at least one agent with a health status of “Healthy” and a value for “last heartbeat” no more than one minute prior. If your Remote Agent is healthy, configure dagBundleConfigList in values.yaml and do a Helm upgrade. You can now run Dags on this Agent.

Remote Execution Agents Helm chart configuration reference

The following Helm chart config values are required to run dags with your Remote Execution Agent. To see all available configuration options with descriptions, see the values.yaml file downloaded in step 2.

agentToken, agentTokenSecretName, or agentTokenFile

You must specify either agentToken, agentTokenSecretName, or agentTokenFile to inject the Agent token generated in Step 1 into the Helm chart.

Run the following command from the directory containing the token.txt file you created in Step 1.

kubectl -n <remote agent namespace> create secret agent-token --from-file=token=token.txt

For agentTokenFile, pass the path to the file containing the Agent token to connect the Agent to the Astro Deployment. This field can be used to enhance the security by not directly exposing the value of the token on the Kubernetes resource. The agent client will read the token from the file on runtime.

imagePullSecretName or imagePullSecretData

You must specify either imagePullSecretName or imagePullSecretData to allow the Remote Execution Agent to pull container images from the registry. This can use your Astro Deployment API Token from your pre-reqs.

For imagePullSecretName, provide the name of an existing Kubernetes secret in the namespace. This secret must contain a key named .dockerconfigjson with your image pull credentials. Use this option if the secret has already been created. Example Kubernetes command to create the secret:

kubectl create secret docker-registry -n <namespace> <secretName> \
  --docker-server=images.astronomer.cloud \
  --docker-username=cli \
  --docker-password=<astroToken>

For imagePullSecretData, provide the Docker config JSON as a string. The Helm chart will use this value to create a secret named image-pull-secret in the namespace. The value must follow the standard Docker config format:

{
  "auths": {
    "<registry.example.com>": {
      "auth": "<auth-token>",
      "email": "<email-address>"
    }
  }
}

namespace

Specifies the Kubernetes namespace where the Astro Remote Agent will be deployed.

If createNamespace is set to true, the Helm chart will create the namespace with the name provided in this field.

If createNamespace is set to false, the namespace must be created manually before deploying the chart, and this field should reference the existing namespace.

See Install Remote Execution Agents in a restricted kubernetes namespace for steps to configure an agent in a kubernetes namespace.

If you are passing agentTokenSecretName and imagePullSecretName, createNamespace must be set to false and the namespace must be created manually with those secrets already present.

resourceNamePrefix

Specifies a name prefix for all the Kubernetes resources related to Remote Execution Agent components that would be deployed by this chart such as Deployments, ConfigMaps and Secrets.

secretBackend

The Airflow secret backend class to use for the Agent. See Secrets backend integration for Remote Execution for external secrets providers configuration instructions.

It is not recommended for production use cases, but you can also use Airflow’s local filesystem backend as a simpler alternative to an external secrets provider by setting:

1 secretBackend: "airflow.secrets.local_filesystem.LocalFilesystemBackend"

and setting the following environment variable in the Remote Execution Agent’s Helm chart:

1 AIRFLOW__SECRETS__BACKEND_KWARGS: '{"variables_file_path": "/files/var.json", "connections_file_path": "/files/conn.json"}'

The rest of the configuration should be passed as environment variables to the Agent components.

xcomBackend

The Airflow XCom backend class to use for the Agent.

See Configure XCom backend for a Remote Execution Agent for how to set up the XCom Backend in the Remote Execution Agent components.

Failures

When the heartbeat between the API server and a Remote Execution Agent is disrupted, the Astro executor prevents task duplication by marking queued tasks from that Agent as failed. This makes tasks eligible for reassignment to healthy Agents. To ensure safe task execution, an Agent must receive explicit confirmation from the API server before starting any task. If an Agent loses connectivity with the API server, the Agent continues executing any tasks that the API server already confirmed and marked as running, but the Agent will not start new tasks until heartbeat communication is restored.

Agent failure

The API Server marks an Agent as failed if the API Server misses three consecutive heartbeat intervals. When that happens, the API Server checks whether the Agent has any “queued” tasks, or tasks the Agent already picked up and started running, but has not yet reported as complete. If a worker Agent fails, the API Server marks those tasks as failed and makes them available for reassignment. If a triggerer Agent fails, the API Server immediately reassigns the tasks, since triggerer tasks are short-lived and idempotent.

Dag scheduling and retention during Agent disconnection

The Airflow scheduler retains all dags that were most recently parsed and sent by the dag processor Agent. If the dag processor Agent or any Remote Execution Agent disconnects or fails, the scheduler continues to use these previously parsed dags. The scheduler will keep creating dag runs on schedule or in response to events, such as dataset updates, for all retained dags.

New or updated dags are not detected until a healthy dag processor Agent reconnects and provides an updated set of dags.
All tasks and dag runs remain pending until a healthy Remote Execution Agent, worker or triggerer, is available for execution.

If no healthy Remote Agents are connected, the scheduler continues to create dag runs for known dags but those tasks remain in queued state and will not execute until an Agent becomes available. If a task stays in queued state for more than 600 seconds (default) or the value set via the AIRFLOW__SCHEDULER__TASK_QUEUED_TIMEOUT environment variable on your Astro deployment, it will be marked as failed.

API Server failure

If an Agent’s heartbeats can’t reach the API Server, the Agent assumes that the API Server and other Agents remain healthy. In this case:

A worker continues running any tasks that the API Server already marked as running, but the worker doesn’t start new tasks until it reconnects with the API Server. This prevents two Agents from running the same task.
A triggerer stops processing tasks entirely until it restores connectivity. Since triggerer workloads are designed to be reassigned immediately when disconnected, trigger execution stops during the partition.

This behavior preserves task safety and prevents duplication for both workers and triggerers, even during partial failures or network partitions.

Manage Remote Execution Agents

On the Remote Agents tab of the Deployment, click More options for a Remote Execution Agent on the list to access an action menu for that Agent.

You can take the following actions on your registered Remote Execution Agents:

Cordon: Cordoning a Remote Execution Agent marks it as unavailable for scheduling new tasks, while allowing it to continue running and complete any tasks already in progress.

This allows you to gracefully remove the Agent from service without interrupting current workloads. For example, you can cordon an Agent to delete or perform maintenance, such as an upgrade, on the Agent or underlying infrastructure.

A cordoned Agent will not receive new work, but it remains active until all running tasks have finished. Once ready to reintroduce the Agent to the task pool, it can be uncordoned to resume normal operation.

Uncordon: Uncordoning a Remote Execution Agent re-enables it to receive new tasks and resume normal scheduling.
Delete: Deletes the Remote Execution Agent from the Deployment.

Remote Execution Agent maintenance policy

Each Remote Execution Agent minor version is maintained for 6 months from the release month.

See Maintenance policy for more details about versioning, support, and upgrade recommendations.