Logs configuration

Astro Private Cloud (APC) provides centralized logging through Vector and Elasticsearch. Task logs, platform logs, and audit logs are collected by Vector and indexed in Elasticsearch for searchability and troubleshooting.

Architecture

Vector runs as a DaemonSet collecting logs from all pods. Logs are shipped to Elasticsearch for storage and indexing. For log visualization, you can connect your own tools (Kibana, Grafana, OpenSearch Dashboards, etc.) to query Elasticsearch.

Accessing logs

Airflow UI

Task logs are accessible directly in the Airflow webserver UI:

  1. Navigate to the Dag.
  2. Click a task instance.
  3. Click Log.

Elasticsearch API

Query logs directly via Elasticsearch:

$# Search for errors in the last hour
$curl -X GET "https://elasticsearch.<base-domain>/_search" \
> -H "Content-Type: application/json" \
> -d '{
> "query": {
> "bool": {
> "must": [
> { "match": { "log_level": "ERROR" } },
> { "range": { "@timestamp": { "gte": "now-1h" } } }
> ]
> }
> }
> }'

BYO visualization

APC does not include a log visualization UI. Connect your preferred tool to Elasticsearch:

  • Kibana: Deploy separately and point to the Elasticsearch endpoint.
  • Grafana: Use the Elasticsearch data source.
  • OpenSearch Dashboards: Use Elasticsearch API compatibility.

Vector configuration

Vector is the log collection agent in APC 1.0.

Enable Vector

1tags:
2 logging: true
3
4vectorEnabled: true
5 vector:
6 resources:
7 requests:
8 cpu: "250m"
9 memory: "512Mi"
10 limits:
11 cpu: "1000m"
12 memory: "1Gi"

Custom log parsing

Add custom transforms to parse Airflow log formats:

1vector:
2 customConfig: |
3 [transforms.parse_airflow]
4 type = "remap"
5 inputs = ["kubernetes_logs"]
6 source = '''
7 .parsed = parse_regex!(.message, r'^\[(?P<timestamp>.+)\] \{(?P<logger>.+)\} (?P<level>\w+) - (?P<message>.+)$')
8 '''

Logging sidecar

APC supports either DaemonSet or sidecar logging on a data plane cluster, but not both simultaneously. To use sidecar logging, you must first disable the Vector DaemonSet, then enable the sidecar:

1global:
2 vectorEnabled: false
3 loggingSidecar:
4 enabled: true
5 name: sidecar-log-consumer
6 repository: quay.io/astronomer/ap-vector
7 tag: 0.52.0
8 resources:
9 requests:
10 cpu: "100m"
11 memory: "386Mi"

Elasticsearch configuration

Enable Elasticsearch

1tags:
2 logging: true
3
4elasticsearch:
5 common:
6 persistence:
7 enabled: true
8
9 client:
10 replicas: 2
11 heapMemory: "2g"
12 resources:
13 requests:
14 cpu: "1"
15 memory: "2Gi"
16 limits:
17 cpu: "2"
18 memory: "4Gi"
19
20 data:
21 replicas: 3
22 heapMemory: "2g"
23 resources:
24 requests:
25 cpu: "1"
26 memory: "2Gi"
27 limits:
28 cpu: "2"
29 memory: "4Gi"
30 persistence:
31 size: "100Gi"
32
33 master:
34 replicas: 3
35 heapMemory: "2g"
36 resources:
37 requests:
38 cpu: "1"
39 memory: "2Gi"

Index lifecycle management

Configure log retention:

1elasticsearch:
2 indexLifecycleManagement:
3 enabled: true
4 policies:
5 - name: airflow-logs
6 phases:
7 hot:
8 actions:
9 rollover:
10 max_size: 50gb
11 max_age: 7d
12 delete:
13 min_age: 30d
14 actions:
15 delete: {}

External logging

Forward to external Elasticsearch

Send logs to your own Elasticsearch cluster:

1global:
2 customLogging:
3 enabled: true
4 scheme: https
5 host: "elasticsearch.example.com"
6 port: "9200"
7 secret: "es-credentials"

Forward to S3

Archive logs to object storage:

1vector:
2 sinks:
3 s3:
4 type: "aws_s3"
5 inputs: ["kubernetes_logs"]
6 bucket: "my-logs-bucket"
7 region: "us-west-2"
8 compression: "gzip"
9 encoding:
10 codec: "json"

Forward to external systems

Configure Vector to send to any destination:

1vector:
2 sinks:
3 # Splunk
4 splunk:
5 type: "splunk_hec"
6 inputs: ["kubernetes_logs"]
7 endpoint: "https://splunk.example.com:8088"
8 token: "${SPLUNK_TOKEN}"
9
10 # Datadog
11 datadog:
12 type: "datadog_logs"
13 inputs: ["kubernetes_logs"]
14 default_api_key: "${DATADOG_API_KEY}"
15
16 # Generic HTTP
17 http:
18 type: "http"
19 inputs: ["kubernetes_logs"]
20 uri: "https://logs.example.com/v1/logs"
21 encoding:
22 codec: "json"

Deployment log settings

Task log retention

Configure log groomer to manage disk usage:

1# In deployment values
2scheduler:
3 logGroomerSidecar:
4 enabled: true
5 retentionDays: 15
6 frequencyMinutes: 15
7
8workers:
9 logGroomerSidecar:
10 enabled: true
11 retentionDays: 15

Log level configuration

1env:
2 - name: AIRFLOW__LOGGING__LOGGING_LEVEL
3 value: "INFO"
4 - name: AIRFLOW__LOGGING__FAB_LOGGING_LEVEL
5 value: "WARNING"

Querying logs

Elasticsearch query examples

Find task failures:

1{
2 "query": {
3 "bool": {
4 "must": [
5 { "match": { "log_level": "ERROR" } },
6 { "match": { "kubernetes.labels.component": "worker" } }
7 ]
8 }
9 }
10}

Search specific Dag:

1{
2 "query": {
3 "bool": {
4 "must": [
5 { "match": { "dag_id": "my_dag" } },
6 { "match": { "task_id": "my_task" } }
7 ]
8 }
9 }
10}

Filter by time range:

1{
2 "query": {
3 "range": {
4 "@timestamp": {
5 "gte": "2026-02-01T00:00:00",
6 "lt": "2026-02-02T00:00:00"
7 }
8 }
9 }
10}

Common log fields

FieldDescription
kubernetes.namespace_nameDeployment namespace
kubernetes.labels.componentComponent (scheduler, worker, etc.)
kubernetes.pod_namePod name
dag_idDAG identifier
task_idTask identifier
log_levelDEBUG, INFO, WARNING, ERROR
@timestampLog timestamp

Troubleshooting

Logs aren’t appearing

  1. Check Vector is running:

    $kubectl get pods -n astronomer -l app=vector
  2. Check Elasticsearch health:

    $kubectl exec -n astronomer elasticsearch-0 -- \
    > curl -s localhost:9200/_cluster/health
  3. Verify Vector logs:

    $kubectl logs -n astronomer -l app=vector --tail=100

High disk usage

  1. Enable index lifecycle management.
  2. Reduce retention period.
  3. Increase Elasticsearch storage.
  4. Forward logs to external storage (S3).

Slow queries

  1. Add index patterns for common searches.
  2. Increase Elasticsearch resources.
  3. Reduce log verbosity.

Security

Access control

Elasticsearch access is restricted to platform components. For external access, configure authentication:

1elasticsearch:
2 auth:
3 enabled: true
4 secretName: "es-credentials"

Log redaction

Redact sensitive data before indexing:

1vector:
2 transforms:
3 redact:
4 type: "remap"
5 source: '''
6 .message = replace(.message, r'password=\S+', "password=***")
7 .message = replace(.message, r'api_key=\S+', "api_key=***")
8 '''

Best practices

  • Set appropriate retention based on compliance requirements.
  • Use log levels wisely - avoid DEBUG in production.
  • Enable log groomer to prevent disk exhaustion on Airflow pods.
  • Forward logs externally for long-term retention and compliance.
  • Monitor Elasticsearch health and disk usage.
  • Use your preferred visualization tool - deploy Kibana, Grafana, or other tools separately.