Show Remote Execution Agent task logs in Airflow UI
You can display task logs in the Airflow UI by exporting logs to object storage and configuring the Astro API Server to retrieve them. Start by enabling log display after task completion, then optionally extend the setup to stream logs in real time as tasks run.
This guide explains configuring post-task log display and expanding that configuration to support real-time log streaming.
Displaying task logs after task completion
Set up log uploading so logs are visible in the Airflow UI after task completion. This requires:
- Remote Execution Agent configuration (
values.yaml) - Astro UI Deployment configuration
- Workload identities: write access for the Remote Execution Agent, read access for the Astro API Server
AWS
GCP
Azure
The Astro Orchestration Plane provides secure private connectivity with a pre-configured S3 Gateway Endpoint.
- Configure the following environment variables in the Helm chart’s
values.yaml, and replace the path for theAIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDERvalue with your information:
Mounting credentials manually
If you do not use workload identity and instead want to manually mount a credential, you must also add the following environment variable defining the location of a token file to your Remote Agent’s values.yaml file. You can customize the file path, /tmp/logging-token, to the name of your logging token file.
-
Run
helm upgradeto apply the change to your Agents. -
In the Astro UI, navigate to your Deployment and click the Details tab. Click Edit in the Advanced section to access your logging configurations.
-
Select Bucket Storage in the Task Logs field and fill in the Bucket URL as
s3://<bucket>/<deployment-id>. Or, use the path that you configured forAIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDERin your Remote Agent’s Helm chart’svalues.yaml. -
In the Workload Identity for Bucket Storage section, select Customer Managed Identity and follow the instructions to set up your Customer Managed Identity so that the identity you create has read access to the specified bucket and path.
-
(Optional) If your log bucket is in a different region from your Astro Deployment, you need to define the AWS region in the
AIRFLOW__ASTRONOMER_PROVIDERS_LOGGING__AWS_REGIONenvironment variable for Astronomer-managed components. In the Astro UI, navigate to your Deployment and click the Environment tab. Click Environment Variables, then click (+) Environment Variable to add the following environment variables to your Deployment:
AIRFLOW__ASTRONOMER_PROVIDERS_LOGGING__AWS_REGION <The region in which the S3 bucket is configured>
Displaying task logs during task execution
Once you have post-completion log visibility, you can enable real-time log display. Remote Execution prevents the Airflow API server from reading logs directly from workers until they reach object storage. Use Vector, included in the RE agent Helm chart, to upload partial logs while tasks are running.
Prerequisites
Before you configure Vector, ensure that your Remote Execution Deployment is already set up to upload task logs to object storage after task completion.
Enable Vector sidecar
Use Vector to watch for log file changes and upload updates to object storage during task execution.
In your Helm values.yaml:
- Set
loggingSidecar.enabledtotrue:
- Configure
loggingSidecar.volumeMounts:
Configure AWS S3 log upload
- Configure
loggingSidecar.config:
AWS authentication with Vector
Above Vector config assumes a managed identity is set up for authentication, as described in Displaying task logs after task completion.
If you require a different way to authenticate with AWS, such as static keys, see https://vector.dev/docs/reference/configuration/sinks/aws_s3/#auth for all available options.
Developing Vector Remap Language (VRL)
Vector expressions are written in Vector Remap Language (VRL). If you want to edit an expression in the Vector config, this online VRL playground is a useful debugging tool.
Debugging Vector
If you’re having issues uploading logs, you can enable debug logging for the Vector sidecar by adding this to the sink configuration (so you’ll have 2 sinks, e.g. an s3 sink, and a debug sink):
With this second sink, Vector will display debug logs on the console, accessible with kubectl logs [worker pod] -c vector-logging-sidecar.
- Configure
workers[*].volumes:
- Configure
workers[*].volumeMounts:
- Configure
triggerer.volumes:
- Configure
triggerer.volumeMounts:
- Set
AIRFLOW__LOGGING__DELETE_LOCAL_LOGSincommonEnv:
Log upload process
Partial logs are uploaded and displayed as follows:
- Airflow worker or triggerer writes local task log files, as set by
AIRFLOW__LOGGING__LOG_FILENAME_TEMPLATE. - Vector watches
/var/log/airflow/task_logs/**/*.logand uploads log changes in chunks while the task runs. - Vector appends a timestamp to the file name before uploading each chunk.
- Airflow scans object storage for log chunks when displaying the UI log view.
- The UI displays all log content to the user.
Version compatibility
Using Vector to upload logs assumes Airflow’s logging format is compatible. Significant changes to Airflow logging may require reconfiguration.
Caveats
Duplicate log storage
After task completion, Airflow uploads the complete log to object storage and deletes the local copy. This causes duplication:
- Partial logs from Vector
- Complete log from Airflow
The Airflow API server deduplicates log lines by timestamp and message. Only storage usage is affected; logs are displayed once.
Small file problem
High-frequency, small log file uploads can create many small objects. This may increase storage costs, load on object storage, or trigger rate limits. Adjust file size and upload frequency in your Vector config to balance performance and cost.
- AWS bills object retrieval at a 128KB minimum on certain storage classes (source).
- A large number of small objects means more object requests (PUTs, GETs, LISTs) and more load on metadata/indexing; this can result in rate limits or latency issues.
Ensure a proper balance between filesize/timeout and log upload frequency in your Vector config.