For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
      • AstroFully-managed data operations, powered by Apache Airflow.
      • Astro Private CloudRun Airflow-as-a-service in your environment.
      • Professional ServicesExpert Airflow services for your enterprise's success.
    • Tools
      • Cosmos
      • Orbiter
      • CLI
      • AI SDK
      • Agents
      • Blueprint
      • UpdatesThe State of Airflow 2026See the insights from over 5,800 data practitioners in the full report. Download Now ➔
  • Customers
  • Docs
    • Insights
      • Blog
      • Webinars
      • Resource Library
      • Events
    • Education
      • Academy
      • What is Airflow?
  • Pricing
Get Started Free
    • Astro Private Cloud overview
    • Astro Private Cloud features
      • Config governance
      • Manage Workspaces
      • Configure Astro Private Cloud
      • Apply a config change
      • Configure cleanup jobs
      • Use kubectl

Product

  • Platform Overview
  • Astro
  • Astro Observe
  • Astro Private Cloud
  • Security & Trust
  • Pricing

Tools & Services

  • Cosmos
  • Docs
  • Professional Services
  • Product Updates

Use Cases

  • AI Ops
  • Data Observability
  • ETL/ELT
  • ML Ops
  • Operational Analytics
  • All Use Cases

Industries

  • Financial Services
  • Gaming
  • Retail
  • Manufacturing
  • Healthcare
  • All Industries

Resources

  • Academy
  • eBooks & Guides
  • Blog
  • Webinars
  • Events
  • The Data Flowcast Podcast
  • All Resources

Airflow

  • What is Airflow
  • Airflow on Astro
  • Airflow 3.0
  • Airflow Upgrades
  • Airflow Use Cases
  • Airflow 2.x End of Life

Company

  • Our Story
  • Customers
  • Newsroom
  • Careers
  • Contact

Support

  • Knowledge Base
  • Status
  • Contact Support
GitHubYouTubeLinkedInx
  • Legal
  • Privacy
  • Terms of Service
  • Consent Preferences

  • Do Not Sell or Share My Personal information
  • Limit the Use Of My Sensitive Personal Information

Apache Airflow®, Airflow, and the Airflow logo are trademarks of the Apache Software Foundation. Copyright © Astronomer 2026. All rights reserved.

LogoLogo
On this page
  • Cleanup jobs summary
  • cleanupDeployments
  • What gets cleaned
  • Configuration
  • Manual trigger
  • cleanupDeployRevisions
  • What gets cleaned
  • Configuration
  • Manual trigger
  • Per-deployment cleanup
  • cleanupTaskUsageData
  • What gets cleaned
  • Configuration
  • Manual trigger
  • GraphQL trigger
  • cleanupClusterAudits
  • What gets cleaned
  • Configuration
  • Manual trigger
  • Filter by cluster
  • cleanupAirflowDb
  • What gets cleaned
  • Configuration
  • Cloud storage export
  • Specific tables only
  • Manual trigger
  • Schedule reference
  • Common configuration options
  • Kubernetes CronJob behavior
  • Monitor cleanup jobs
  • Check job status
  • Verify data cleanup
  • Troubleshooting
  • Job not running
  • Job failing
  • Data not being cleaned
  • Best practices
  • Related documentation
Platform administration

Configure cleanup jobs

Edit this page
Built with

Configure automated cleanup jobs to maintain database health by removing old data. Astro Private Cloud (APC) includes several cleanup jobs that run as CronJobs on configurable schedules to manage storage growth and query performance.

Cleanup jobs summary

JobDefault ScheduleDefault RetentionPurpose
cleanupDeploymentsDaily @ 00:0014 daysRemoves soft-deleted deployments
cleanupDeployRevisionsDaily @ 23:1190 daysArchive deploy history
cleanupTaskUsageDataDaily @ 23:4090 daysPurge task metrics
cleanupClusterAuditsDaily @ 23:4990 daysRemove cluster audit logs
cleanupAirflowDbDaily @ 05:23365 daysClean Airflow metadata (disabled by default)

cleanupDeployments

Permanently removes deployments that have been soft-deleted after the retention period.

What gets cleaned

  • Deployment database records marked with deletedAt
  • Associated Docker registry images
  • Deployment metadata database

Configuration

1houston:
2 cleanupDeployments:
3 enabled: true
4 schedule: "0 0 * * *" # Midnight daily
5 olderThan: 14 # Days since deletion
6 dryRun: false # Set true to preview

Manual trigger

Run this command from a machine with access to the underlying Kubernetes cluster:

$kubectl -n <namespace> exec -it deploy/<release-name>-houston -- yarn cleanup-deployments --older-than=14 --dry-run=false

cleanupDeployRevisions

Removes old deployment revision records to reduce database size.

What gets cleaned

  • deployRevision records older than retention period
  • Historical deployment configuration snapshots

Configuration

1houston:
2 cleanupDeployRevisions:
3 enabled: true
4 schedule: "11 23 * * *" # 23:11 daily
5 olderThan: 90 # Days to retain

Manual trigger

Run this command from a machine with access to the underlying Kubernetes cluster:

$kubectl -n <namespace> exec -it deploy/<release-name>-houston -- yarn cleanup-deploy-revisions --older-than=90

Per-deployment cleanup

Run this command from a machine with access to the underlying Kubernetes cluster to clean revisions for a specific deployment:

$kubectl -n <namespace> exec -it deploy/<release-name>-houston -- yarn cleanup-deploy-revisions --older-than=90 --deploymentUuid=<uuid>

cleanupTaskUsageData

Purges task usage metrics and audit logs.

What gets cleaned

  • TaskUsage records (daily aggregated metrics)
  • TaskUsageAuditLog records (raw task data)

Configuration

1houston:
2 cleanupTaskUsageData:
3 enabled: true
4 schedule: "40 23 * * *" # 23:40 daily
5 olderThan: 90 # Minimum 90 days
6 dryRun: false

Minimum retention is 90 days and can’t be reduced.

Manual trigger

Run this command from a machine with access to the underlying Kubernetes cluster:

$kubectl -n <namespace> exec -it deploy/<release-name>-houston -- yarn cleanup-task-usage-data --older-than=90 --dry-run=false

GraphQL trigger

1query {
2 cleanupTaskUsageDataJob(olderThan: 90)
3}

cleanupClusterAudits

Removes cluster audit log entries.

What gets cleaned

  • ClusterAudit records tracking cluster configuration changes
  • Historical cluster state snapshots

Configuration

1houston:
2 cleanupClusterAudits:
3 enabled: true
4 schedule: "49 23 * * *" # 23:49 daily
5 olderThan: 90 # Days to retain

Manual trigger

Run this command from a machine with access to the underlying Kubernetes cluster:

$kubectl -n <namespace> exec -it deploy/<release-name>-houston -- yarn cleanup-cluster-audit --older-than=90

Filter by cluster

Run this command from a machine with access to the underlying Kubernetes cluster to clean audits for specific clusters:

$kubectl -n <namespace> exec -it deploy/<release-name>-houston -- yarn cleanup-cluster-audit --older-than=90 --cluster-ids=<id1>,<id2>

cleanupAirflowDb

Cleans Airflow metadata from individual Deployment databases.

This job is disabled by default due to potential impact on running Deployments.

What gets cleaned

Default tables:

  • callback_request - Task callback requests
  • celery_taskmeta, celery_tasksetmeta - Celery metadata
  • dag - Dag definitions
  • dag_run - Dag execution history
  • dataset_event - Dataset events
  • import_error - Import errors
  • job - Job records
  • log - Task execution logs
  • session - Session data
  • sla_miss - SLA violations
  • task_fail - Task failures
  • task_instance - Task execution records
  • task_reschedule - Reschedule events
  • trigger - Trigger records
  • xcom - Cross-communication data

Configuration

1houston:
2 cleanupAirflowDb:
3 enabled: false # Must explicitly enable
4 schedule: "23 5 * * *" # 05:23 daily
5 olderThan: 365 # Days to retain
6 outputPath: "/tmp" # Archive location
7 dropArchives: true # Delete after archiving
8 dryRun: false
9 provider: local # Storage: local/aws/azure/gcp
10 bucketName: "/tmp" # Cloud bucket or local path
11 tables: "" # Specific tables (empty = all)

Cloud storage export

Export archived data to cloud storage:

1houston:
2 cleanupAirflowDb:
3 enabled: true
4 provider: aws # aws, azure, or gcp
5 bucketName: "my-archive-bucket"
6 providerEnvSecretName: "aws-credentials-secret"

Specific tables only

Clean only specific tables:

1houston:
2 cleanupAirflowDb:
3 enabled: true
4 tables: "log,task_instance,xcom"

Manual trigger

Run this command from a machine with access to the underlying Kubernetes cluster:

$kubectl -n <namespace> exec -it deploy/<release-name>-houston -- yarn cleanup-airflow-db-data \
> --older-than=365 \
> --provider=local \
> --bucket-name=/tmp \
> --tables="log,task_instance"

Schedule reference

Default schedules are staggered to avoid simultaneous execution:

TimeJob
00:00cleanupDeployments
05:23cleanupAirflowDb
23:11cleanupDeployRevisions
23:40cleanupTaskUsageData
23:49cleanupClusterAudits

Common configuration options

All cleanup jobs share these options:

1houston:
2 cleanup<JobName>:
3 enabled: true/false # Enable/disable the job
4 schedule: "cron-expression" # When to run
5 olderThan: <days> # Retention period
6 dryRun: false # Preview without deleting
7 readinessProbe: {} # Optional health probes
8 livenessProbe: {}

Kubernetes CronJob behavior

All cleanup CronJobs use:

  • Concurrency policy: Forbid (prevents overlapping runs)
  • Backoff limit: 1 retry on failure
  • Restart policy: Never

Monitor cleanup jobs

Check job status

$# List all cleanup CronJobs
$kubectl get cronjobs -n astronomer | grep cleanup
$
$# View recent job runs
$kubectl get jobs -n astronomer | grep cleanup
$
$# Check job logs
$kubectl logs job/<job-name> -n astronomer
$
$# Trigger individual jobs manually
$kubectl create job --from=cronjobs/jobname jobname-hash -n astronomer

Verify data cleanup

1-- Check remaining records by date
2SELECT DATE(created_at), COUNT(*)
3FROM deploy_revision
4GROUP BY DATE(created_at)
5ORDER BY DATE(created_at) DESC;

Troubleshooting

Job not running

  1. Check CronJob exists:

    $kubectl get cronjob houston-cleanup-deployments -n astronomer
  2. Check job is enabled in Helm values.

  3. Verify schedule syntax is valid cron expression.

Job failing

  1. Check job logs:

    $kubectl logs job/houston-cleanup-deployments-<timestamp> -n astronomer
  2. Database connectivity: Ensure the APC API can reach the database.

  3. Permissions: Verify service account has required database permissions.

Data not being cleaned

  1. Check retention period: Data younger than olderThan won’t be deleted.
  2. Verify timestamps: Check createdAt/deletedAt values in database.
  3. Run with dry-run: Preview what would be deleted.

Best practices

  1. Monitor database size before and after cleanup jobs
  2. Start with dry-run when adjusting retention periods
  3. Stagger schedules if adding custom cleanup jobs
  4. Archive before delete for cleanupAirflowDb in production
  5. Set alerts for failed cleanup jobs

Related documentation

  • Apply platform configuration