Configure LoggingSidecar in a Remote Execution Agent

Airflow 3

This feature is only available for Airflow 3.x Deployments.

Airflow task logs are generated when tasks execute in the Worker and Triggerer components. The logging sidecar is a container that runs alongside these components to collect and ship task logs to external systems like Splunk, Elasticsearch, AWS CloudWatch, or other log aggregation services.

The following procedure describes how to configure your Remote Execution Agent to use the logging sidecar. This process configures the loggingSidecar section in your values.yaml file, which controls the deployment of a sidecar container that collects and forwards task logs.

Prerequisites

You must have permissions for deployment create or pod create in the kubernetes Namespace where your Remote Execution Agent is installed.

Enable the Logging Sidecar

Configure volumes in your Agent Worker and Agent Triggerer components of your values.yaml file to collect task logs:

1 workers:
2 - name: default-worker
3     volumes:
4     - name: task-logs
5         emptyDir: {}
6     volumeMounts:
7     - name: task-logs
8         mountPath: /usr/local/airflow/logs
9 
10 triggerer:
11 volumes:
12     - name: task-logs
13     emptyDir: {}
14 volumeMounts:
15     - name: task-logs
16     mountPath: /usr/local/airflow/logs

To enable the logging sidecar, set enabled to true in your Remote Execution Agent’s values.yaml file, and define the name of your logging sidecar and the image you want to use. Astronomer recommends using Vector for exporting task logs and the following example uses the Timber docker image for it.

1 loggingSidecar:
2   enabled: true
3   name: vector-logging-sidecar
4   image: timberio/vector:0.45.0-debian

Allocate resources for your sidecar container in the values.yaml file:

1 loggingsidecar:
2       resources:
3     limits:
4       cpu: "0.5"
5       memory: "1Gi"
6     requests:
7       cpu: "0.5"
8       memory: "1Gi"

Example logging sidecar configuration

The following YAML file shows a full configuration example for a logging sidecar that uses Vector to export task log data to the Splunk Cloud Platform.

1 loggingSidecar:
2   enabled: true
3   name: vector-logging-sidecar
4   image: timberio/vector:0.45.0-debian
5 
6   # Mount the task logs directory to access log files
7   volumeMounts:
8     - name: task-logs
9       mountPath: /etc/vector/task_logs
10 
11   # Resource allocation for the sidecar container
12   resources:
13     limits:
14       cpu: "0.5"
15       memory: "1Gi"
16     requests:
17       cpu: "0.5"
18       memory: "1Gi"
19 
20   # Vector configuration
21   config: |
22 
23     data_dir: /etc/vector/task_logs
24 
25     # Define log sources
26     sources:
27       task_logs:
28         type: file
29         include:
30           - /etc/vector/task_logs/**/*.log
31 
32     transforms:
33       parse_task_log_file:
34         type: remap
35         inputs:
36           - task_logs
37         source: |
38 
39           parsed = parse_regex!(.file, r'/dag_id=(?P<dagID>[0-9a-z-_]+)/run_id=(?P<runID>[^/]+)/task_id=(?P<taskID>[0-9a-z-_]+)/(?:map_index=(?P<mapIndex>-?[0-9]+)/)?attempt=(?P<attempt>[0-9]+)/(?P<tiID>[0-9a-z-]+)(?:\.log\.trigger\.[0-9]+)?\.log$')
40           .tiID = parsed.tiID
41           .attempt = parsed.attempt
42           .taskID = parsed.taskID
43           .runID = parsed.runID
44           .dagID = parsed.dagID
45           .mapIndex = parsed.mapIndex ?? -1
46 
47     sinks:
48       splunk:
49         type: splunk_hec_logs
50         inputs:
51           - parse_task_log_file
52         endpoint: https://<your-domain>.splunkcloud.com
53         default_token: <token>
54         index: <your-index>
55         indexed_fields:
56           - tiID
57           - attempt
58           - taskID
59           - runID
60           - dagID
61         encoding:
62           codec: "text"
63 
64 workers:
65 - name: default-worker
66     volumes:
67     - name: task-logs
68         emptyDir: {}
69     volumeMounts:
70     - name: task-logs
71         mountPath: /usr/local/airflow/logs
72 
73 triggerer:
74 volumes:
75     - name: task-logs
76     emptyDir: {}
77 volumeMounts:
78     - name: task-logs
79     mountPath: /usr/local/airflow/logs