Ephemeral storage configuration

Ephemeral storage in Astro Private Cloud (APC) uses Kubernetes emptyDir volumes for temporary data that doesn’t need to persist across pod restarts. This guide covers configuring ephemeral storage for Airflow components including Dags, logs, and Redis.

Ephemeral storage overview

Ephemeral storage (emptyDir volumes) provides:

  • Temporary storage that exists for the lifetime of a pod.
  • Faster I/O if using memory-backed storage.
  • No persistence across pod restarts.
  • Shared access between containers in the same pod.

APC 1.x uses readOnlyRootFilesystem, which mounts the entire /usr/local/airflow directory on an emptyDir volume for every Airflow pod. Depending on Dag size, this can consume significant ephemeral storage and exceed namespace limits. Factor this into your ephemeral storage sizing.

Configurable volumes

Dags volume (gitSync)

When using git-sync for Dag deployment, Dags are stored in an emptyDir volume.

1dags:
2 gitSync:
3 enabled: true
4 emptyDirConfig:
5 sizeLimit: 2Gi
6 medium: Memory # Optional: use RAM instead of disk

Parameters:

  • sizeLimit: Maximum storage size (e.g., 1Gi, 2Gi).
  • medium: Storage medium.
    • "" (empty): Use node’s default storage (disk).
    • "Memory": Use RAM (tmpfs) for faster access.

Logs volume

Task logs can be stored in ephemeral storage when persistence is disabled.

1logs:
2 persistence:
3 enabled: false
4 emptyDirConfig:
5 sizeLimit: 10Gi
6 medium: "" # Use disk for logs (recommended)

Recommendations:

  • Use larger sizeLimit for high-volume task execution.
  • Avoid medium: Memory for logs unless you have aggressive cleanup.
  • Enable log groomer sidecar to prevent storage exhaustion.

Redis volume

When Redis persistence is disabled, it uses ephemeral storage.

1redis:
2 persistence:
3 enabled: false
4 emptyDirConfig:
5 sizeLimit: 1Gi
6 medium: Memory # RAM-backed for performance

Configuration examples

Development environment

1dags:
2 gitSync:
3 enabled: true
4 emptyDirConfig:
5 sizeLimit: 1Gi
6
7logs:
8 emptyDirConfig:
9 sizeLimit: 5Gi
10
11redis:
12 emptyDirConfig:
13 sizeLimit: 512Mi
14 medium: Memory

Production environment

1dags:
2 gitSync:
3 enabled: true
4 emptyDirConfig:
5 sizeLimit: 5Gi
6 medium: "" # Use disk storage for durability
7
8logs:
9 persistence:
10 enabled: true # Use persistent storage in production
11 size: 100Gi
12
13redis:
14 persistence:
15 enabled: true # Use persistent storage in production
16 size: 8Gi

High-performance configuration

For latency-sensitive workloads:

1dags:
2 gitSync:
3 enabled: true
4 emptyDirConfig:
5 sizeLimit: 2Gi
6 medium: Memory # Faster Dag parsing
7
8redis:
9 emptyDirConfig:
10 sizeLimit: 2Gi
11 medium: Memory # Faster task queue operations

Storage medium comparison

MediumSpeedPersistenceMemory ImpactUse Case
"" (disk)ModeratePod lifetimeNoneLogs, large Dags
"Memory"FastPod lifetimeConsumes RAMRedis, small Dags

Memory-backed volumes (medium: Memory) count against container memory limits. If the volume grows too large, pods may be OOMKilled. Size memory limits accordingly or use disk-backed storage.

Sizing guidelines

Dags volume

Dag countRecommended size
< 501Gi
50-2002Gi
200-5005Gi
500+10Gi

Logs volume

Task VolumeRecommended Size
< 100 tasks/day5Gi
100-1000 tasks/day10Gi
1000-10000 tasks/day50Gi
10000+ tasks/dayUse persistent storage

Redis volume

Worker CountRecommended Size
1-5 workers512Mi
5-20 workers1Gi
20-50 workers2Gi
50+ workers4Gi

Monitoring storage usage

Check volume usage

$kubectl exec -n <namespace> <pod-name> -- df -h

Check memory-backed volume

$kubectl exec -n <namespace> <pod-name> -- mount | grep tmpfs

Troubleshooting

Pod eviction due to storage

Symptom: Pods evicted with DiskPressure or ephemeral storage exceeded.

Cause: EmptyDir volume exceeded node’s ephemeral storage limits.

Solution:

  1. Increase sizeLimit in emptyDirConfig.
  2. Enable log groomer with shorter retention.
  3. Switch to persistent storage.

Out of memory with memory-backed volumes

Symptom: Pods OOMKilled when using medium: Memory.

Cause: Memory-backed emptyDir counts against container memory limits.

Solution:

  1. Increase container memory limits.
  2. Reduce sizeLimit on memory-backed volumes.
  3. Switch to disk-backed storage.

Slow Dag parsing

Symptom: Dag processing takes too long.

Cause: Disk I/O latency on Dag volume.

Solution:

  1. Use medium: Memory for Dag volume.
  2. Ensure sufficient sizeLimit.
  3. Consider SSD-backed nodes.

Best practices

  • Set explicit sizeLimit to prevent unbounded storage growth.
  • Use memory sparingly for performance-critical, small volumes only.
  • Monitor usage and set alerts for storage utilization.
  • Use persistent storage for production logs since ephemeral storage loses logs on restart.
  • Size for peak usage to account for burst workloads.
  • Enable log groomer to prevent log accumulation.