Remote Execution Agents

This is feature is only available if you are on the Enterprise tier or above. See Astro Plans and Pricing.
Airflow 3
This feature is only available for Airflow 3.x Deployments.

Overview

Remote Execution relies on Agents in your environment, or the execution plane, to communicate with the API server in the Astro orchestration plane. The Agent heartbeats with its capabilities and the queue it is listening on, and the API server will respond back to assign work accordingly. Worker Agents run synchronous tasks, Triggerer Agents run asynchronous tasks/deferrable operators and the Dag processor Agent processes dags and sends up serialized representations.

You can register multiple remote queues for your Remote Execution Agents for similar reasons you would register multiple worker queues in Hosted execution mode.

Add https://<clusterId>.external.astronomer.run/ to your organization’s network allowlist so the Remote Execution Agents in your environment are able to heartbeat to the API server in the Astro orchestration plane.

Register Remote Execution Agent with a Deployment

Prerequisites

  • Kubernetes 1.30+
  • Helm 3+
  • A valid Astro account with permissions to create an Astro Agent
  • An Agent Token from your Astro Deployment
  • An API Deployment Token with Deployment Admin scoped permissions to pull the base Astro Remote Execution Agent Image

Step 1: Create Agent Token

  1. In the Astro UI, select a Workspace, click Deployments, and then select the Remote Execution Deployment that you want to register the Remote Execution Agent with.
  2. Select the Remote Agents tab.
  3. Use the toggle to switch to the Tokens view.
  4. Click +Agent Token.
  5. Define a Name, Expiration and optionally a Description for the Agent Token.
  6. Copy the Agent Token. You will not be shown the Agent Token value again.
There is a default limit of 50 Agent Tokens per Deployment. Contact Astronomer support if you need a higher limit.

Step 2: Install the Helm chart

If you want to self-host the Remote Execution Agent image, you can use either an Organization API token with the Org Owner role or a Deployment API token with the Deployment Admin role to authenticate with Astronomer’s private image registry.
  1. Use the toggle to switch back to the Agents view, and click Register a Remote Agent
  2. Click the Download button on the modal to download the values.yaml file. Use this Helm chart to configure the Remote Execution Agent.
  3. Most config options in the values.yaml do not need to be updated to register and activate the Remote Agent. The following values in the Remote Agent’s Helm chart need to be updated: resourceNamePrefix, namespace, secretBackend, xcomBackend, imagePullSecretName, and agentToken or agentTokenSecretName. The descriptions of each value is provided in the Helm chart itself.
  4. If self-hosting the image, log in to the image registry with your token:
docker login images.astronomer.cloud -u cli -p <your-token>

After you log in, you can pull the Remote Execution Agent image directly. To find the latest version and image path, refer to the Remote Execution Agent release notes, which include all currently hosted image tags and their full URLs.

  1. Install the Helm chart. The following code also adds the Helm repo and applies any updates:
$helm repo add astronomer https://helm.astronomer.io
>helm repo update
>helm install astro-agent astronomer/astro-remote-execution-agent -f values.yaml

Step 3: Set allowed IP address ranges

Setting allowed IP address ranges for a Remote Execution Deployment is not necessary but allows you to limit the Deployment’s incoming traffic to the Remote Agents in your environment.

  1. In the Set Allowed IP Address Ranges step on the modal, click Edit this Deployment.
  2. Click +Add IP.
  3. Add the IP address range, then click Add.
  4. Repeat to allowlist more IP address ranges.

Step 4: Check for Agent heartbeat

Close the modal to view the Agent list page and check for a healthy agent.

You should see at least one agent with a health status of “Healthy” and a value for “last heartbeat” no more than one minute prior. If your Remote Agent is healthy, configure dagBundleConfigList in values.yaml and do a Helm upgrade. You can now run Dags on this Agent.

Remote Executon Agents Helm chart configuration reference

The following Helm chart config values are required to run dags with your Remote Execution Agent. To see all available configuration options with descriptions, see the values.yaml file downloaded in step 2.

agentToken or agentTokenSecretName

You must specify either agentToken or agentTokenSecretName to inject the Agent token generated in Step 1 into the Helm chart.

  • For agentToken, pass the Agent token value copied in Step 1 into the Helm chart as a string, and Helm will create a secret with the name agent-token-secret in the namespace.

  • For agentTokenSecretName, pass the name of the secret containing the Agent token to connect the Agent to the Astro Deployment. This field should be used when the secret has already been created in the namespace through your own external mechanism. The secret must contain a key named token with the value set to the Agent Token created in Step 1.

imagePullSecretName or imagePullSecretData

You must specify either imagePullSecretName or imagePullSecretData to allow the Remote Execution Agent to pull container images from the registry. This can use your Astro Deployment API Token from your pre-reqs.

  • For imagePullSecretName, provide the name of an existing Kubernetes secret in the namespace. This secret must contain a key named .dockerconfigjson with your image pull credentials. Use this option if the secret has already been created. Example Kubernetes command to create the secret:
kubectl create secret docker-registry -n <namespace> <secretName> \
--docker-server=images.astronomer.cloud \
--docker-username=cli \
--docker-password=<astroToken>
  • For imagePullSecretData, provide the Docker config JSON as a string. The Helm chart will use this value to create a secret named image-pull-secret in the namespace. The value must follow the standard Docker config format:
{
"auths": {
"<registry.example.com>": {
"auth": "<auth-token>",
"email": "<email-address>"
}
}
}

namespace

Specifies the Kubernetes namespace where the Astro Remote Agent will be deployed.

If createNamespace is set to true, the Helm chart will create the namespace with the name provided in this field.

If createNamespace is set to false, the namespace must be created manually before deploying the chart, and this field should reference the existing namespace.

See Install Remote Execution Agents in a restricted kubernetes namespace for steps to configure an agent in a kubernetes namespace.

If you are passing agentTokenSecretName and imagePullSecretName, createNamespace must be set to false and the namespace must be created manually with those secrets already present.

resourceNamePrefix

Specifies a name prefix for all the Kubernetes resources related to Remote Execution Agent components that would be deployed by this chart such as Deployments, ConfigMaps and Secrets.

secretBackend

The Airflow secret backend class to use for the Agent. See Secrets backend integration for Remote Execution for external secrets providers configuration instructions.

It is not recommended for production use cases, but you can also use Airflow’s local filesystem backend as a simpler alternative to an external secrets provider by setting:

1secretBackend: "airflow.secrets.local_filesystem.LocalFilesystemBackend"
The rest of the configuration should be passed as environment variables to the Agent components.

xcomBackend

The Airflow XCom backend class to use for the Agent.

See Configure XCOM backend for a Remote Execution Agent for how to set up the XCom Backend in the Remote Execution Agent components.

Failures

When the heartbeat between the API server and a Remote Execution Agent is disrupted, the Astro executor prevents task duplication by marking queued tasks from that Agent as failed. This makes tasks eligible for reassignment to healthy Agents. To ensure safe task execution, an Agent must receive explicit confirmation from the API server before starting any task. If an Agent loses connectivity with the API server, the Agent continues executing any tasks that the API server already confirmed and marked as running, but the Agent will not start new tasks until heartbeat communication is restored.

Agent failure

The API Server marks an Agent as failed if the API Server misses three consecutive heartbeat intervals. When that happens, the API Server checks whether the Agent has any “queued” tasks, or tasks the Agent already picked up and started running, but has not yet reported as complete. If a worker Agent fails, the API Server marks those tasks as failed and makes them available for reassignment. If a triggerer Agent fails, the API Server immediately reassigns the tasks, since triggerer tasks are short-lived and idempotent.

API Server failure

If an Agent’s heartbeats can’t reach the API Server, the Agent assumes that the API Server and other Agents remain healthy. In this case:

  • A worker continues running any tasks that the API Server already marked as running, but the worker doesn’t start new tasks until it reconnects with the API Server. This prevents two Agents from running the same task.
  • A triggerer stops processing tasks entirely until it restores connectivity. Since triggerer workloads are designed to be reassigned immediately when disconnected, trigger execution stops during the partition.

This behavior preserves task safety and prevents duplication for both workers and triggerers, even during partial failures or network partitions.

Manage Remote Execution Agents

On the Remote Agents tab of the Deployment, click More options for a Remote Execution Agent on the list to access an action menu for that Agent.

You can take the following actions on your registered Remote Execution Agents:

  • Cordon: Cordoning a Remote Execution Agent marks it as unavailable for scheduling new tasks, while allowing it to continue running and complete any tasks already in progress.

This allows you to gracefully remove the Agent from service without interrupting current workloads. For example, you can cordon an Agent to delete or perform maintenance, such as an upgrade, on the Agent or underlying infrastructure.

A cordoned Agent will not receive new work, but it remains active until all running tasks have finished. Once ready to reintroduce the Agent to the task pool, it can be uncordoned to resume normal operation.

  • Uncordon: Uncordoning a Remote Execution Agent re-enables it to receive new tasks and resume normal scheduling.

  • Delete: Deletes the Remote Execution Agent from the Deployment.

Remote Execution Agent maintenance policy

Each Remote Execution Agent minor version is maintained for 6 months from the release month.

See Maintenance policy for more details about versioning, support, and upgrade recommendations.