Join us for Astro Days: NYC on Sept 27!

Debugging DAGs


Getting started with Airflow is easy if you know a bit of Python; you create your DAG file, import your operators, define your tasks, and you’re off and running. But what happens when you run your DAG and something goes wrong? Maybe your tasks are failing unexpectedly, or are stuck in a scheduled state, or your DAGs aren’t showing up in the Airflow UI at all.

For these common situations (and a few more), we’ve got you covered! In this guide, we’ll cover some frequently encountered issues with Airflow DAGs, and how to debug them. If you’re brand new to Airflow, we recommend also checking out one of our Introduction to Airflow Webinars to get started.

Note: This guide focuses on Airflow 2.0+. For older Airflow versions, some debugging steps may be slightly different.

DAGs Aren’t Showing Up in the Airflow UI

One of the first issues you can encounter when developing DAGs is that your DAGs do not show up in the Airflow UI. You might define a DAG in a Python file and add it to your dags_folder, but when you check the Airflow UI, nothing shows up.

If a DAG isn’t appearing in the Airflow UI, it’s typically because Airflow is unable to parse the DAG. In this case, you’ll see an Import Error in the Airflow UI.

Import Error

This error message should tell you what you need to fix. Most frequently, the cause of the problem will be a syntax or package import error.

If you don’t see an import error message, here are some debugging steps to try:

  • Airflow scans the dags_folder for new DAGs every dag_dir_list_interval, which defaults to 5 minutes but can be modified. You might have to wait until this interval has passed before a new DAG appears in the UI.

  • Ensure that your user has permission to see the DAGs, and that the permissions on the DAG file are correct.

  • Run airflow dags list with the Airflow CLI to make sure that Airflow has registered the DAG in the metastore. If the DAG appears in the list, try restarting the webserver.

  • Try restarting the scheduler (if you are using the Astro CLI, run astro dev stop && astro dev start).

  • If you see an error that the scheduler is not running like in the following screenshot, check the scheduler logs to see if something in the DAG file is causing the scheduler to crash (if you are using the Astro CLI, run astro dev logs --scheduler). Then try restarting.

    No Scheduler

If DAGs don’t appear in the Airflow UI when working from an Astronomer Airflow Deployment, there are a few additional things you can check:

  • Ensure your Dockerfile Runtime/Astronomer Certified version matches the Airflow version of your Deployment. A mismatch here can cause DAGs not to show up after they’ve been deployed.
  • For Astronomer Certified images, ensure that you are using the onbuild image (e.g. FROM Images without onbuild will not bundle files in the dags/ folder when deployed.
  • Ensure that the permissions on your local files aren’t too locked down.

Dependency Conflicts

As noted above, one frequent cause of DAG import errors is not having supporting packages installed in your Airflow environment. For example, any provider packages that your DAGs use for hooks and operators must be installed separately.

How you install supporting Python or OS packages will depend on your Airflow setup. If you are working with an Astronomer project, you can add Python and OS packages to your requirements.txt and packages.txt files respectively, and they will be automatically installed when your Docker image builds when you deploy or start the project locally.

One thing to watch out for, especially with Python packages, is dependency conflicts. If you are running Airflow using Docker, these conflicts can cause errors when you build your image. With the Astro CLI, errors and warnings will be printed in your terminal when you run astro dev start, and you might see import errors for your DAGs in the Airflow UI if packages failed to install. In general, all packages you install should be available in your scheduler pod. You can double check they were installed successfully by exec’ing into your scheduler pod as described in Astronomer documentation.

If you do have package conflicts that can’t be resolved, consider breaking up your DAGs into multiple projects that are run on separate Airflow deployments so that DAGs requiring conflicting packages are not in the same environment. For example, you might have a set of DAGs that require package X that run on Airflow Deployment A, and another set of DAGs that require package Y (which conflicts with package X) that run on Airflow Deployment B. Alternatively, you can use the Kubernetes Pod Operator to isolate package dependencies to a specific task and avoid conflicts in your broader Airflow environment.

Tasks Aren’t Running

In this scenario, your DAG is visible in the Airflow UI, but your tasks don’t run when you trigger the DAG. This is a commonly encountered issue in Airflow, and the causes can be very simple or complex. Below are some debugging steps that can resolve the most common scenarios:

  • Make sure your DAG is toggled to Unpaused. If your DAG is paused when you trigger it, the tasks will not run.

    Paused DAG

    DAGs are deployed paused by default, but you can change this behavior by setting dags_are_paused_at_creation=False in your Airflow config (if you do this, be aware of the catchup parameter in your DAGs).

    Note: As of Airflow 2.2, paused DAGs will be unpaused automatically if you manually trigger them.

  • Ensure your DAG has a start date that is in the past. If your start date is in the future, triggering the DAG results a “successful” DAG run even though no tasks ran.

  • Note that a DAG run will only be automatically triggered when you unpause the DAG if the start date and end of the data interval are both in the past, and catchup=True. For more details on data intervals and DAG scheduling, check out our Scheduling and Timetables guide.

  • If you are using a custom timetable, ensure that the data interval for your DAG run does not precede the DAG’s start date.

  • If your tasks are getting stuck in a scheduled or queued state, ensure your scheduler is running properly. If needed, restart the scheduler or increase scheduler resources in your Airflow infrastructure.

  • If you added tasks to an existing DAG that has depends_on_past=True, those newly added tasks won’t run until their state is set for prior task runs.

Tasks Have a Failure Status

Your tasks can occasionally fail after they start running. You can check on task run failures by going to the Tree View or the Graph View in the Airflow UI. Failed task runs appear as red squares.

Tree View Task Failure

To figure out what’s going on, task logs are your best resource. To access logs, click on the failed task in either the Tree View or Graph View and click the Log button.

Get Logs

This will take you to the logs for that task, which will have information about the error that caused the failure.

Error Log

To make catching and debugging task failures easier, you can set up error notifications. Check out this guide for details on setting up email, Slack, and custom notifications in Airflow.

One thing to watch out for with task failures in newly developed DAGs is an error message like Task exited with return code Negsignal.SIGKILL or that contains a -9 error code. These usually indicate that the task ran out of memory. Try increasing the resources for your scheduler, webserver, or pod, depending on whether you’re running the Local, Celery, or Kubernetes executors respectively.

Logs Aren’t Showing Up

Less commonly, when you check your task logs to debug a failure, you may not see any logs at all. On the log page in the UI, you may see a spinning wheel that lasts forever, or you may just see a blank file.

Generally, logs fail to show up when a process dies in your scheduler or worker and the communication is lost. Here are a couple of things you can try to get your logs showing up again:

  • In case it was a one-off issue, try rerunning the task by clearing the task instance to see if the logs appear during the rerun.
  • Increase your log_fetch_timeout_sec parameter to greater than the 5 second default. This parameter controls how long the webserver will wait for the initial handshake when fetching logs from the worker machines, and having extra time here can sometimes resolve issues.
  • Increase the resources available to your workers (if using the Celery executor) or scheduler (if using the local executor).
  • If you’re using the Kubernetes executor and a task fails very quickly (e.g. in less than 15 or so seconds), the pod running the task spins down before the webserver has a chance to collect the logs from the pod. If possible, you can try building in some wait time to your task depending on which operator you’re using. If that isn’t possible, try to diagnose what could be causing a near-immediate failure in your task. This is often related to either lack of resources (try increasing CPU/memory for the task) or an error in the task configuration.
  • If you’re looking at historical task failures, ensure that your logs are retained until you need to access them. For example, the default log retention period on Astronomer is 15 days, so any logs prior to that will not be stored.
  • If none of the above works, try checking your scheduler and webserver logs for any errors that might indicate why your task logs aren’t showing up.

Connection Troubleshooting

Typically, Airflow connections are needed for Airflow to talk to any external system. Most hooks and operators expect a defined connection parameter. Because of this, improperly defined connections are one of the most common issues Airflow users have to debug when first working with their DAGs.

While the specific error associated with a poorly defined connection can vary widely, you will typically see a message with “connection” in your task logs. If you haven’t defined a connection, you’ll see a message like 'connection_abc' is not defined.

Below are some general tips and tricks for getting them connections work:

  • Check out the Airflow managing connections documentation to get familiar with how connections work.

  • Most hooks and operators will use the default connection of the correct type. You can change the default connection to use your connection details or define a new connection with a different name and pass that to the hook/operator.

  • Consider upgrading to Airflow 2.2 so you can use the test connections feature in the UI or API. This will save you having to run your full DAG to make sure the connection works.

    Test Connections

  • Every hook/operator will have its own way of using a connection, and it can sometimes be tricky to figure out what parameters are needed. The Astronomer Registry can be a great resource for this: many hooks and operators have documentation there on what is required for a connection.

  • You can define connections using Airflow environment variables instead of adding them in the UI. Take care to not end up with the same connection defined in multiple places. If you do, the environment variable will take precedence.

Recovering from Failures

Once you have identified the cause of any failures in your tasks, you can begin to address them. If you’ve made any changes to your code, make sure to redeploy (if applicable) and check the Code View in the Airflow UI to make sure that your changes have been picked up by Airflow.

If you want to rerun your whole DAG or specific tasks after making changes, you can easily do so with Airflow. Check out this guide for details on how to rerun and apply backfills or catchups.

How to address specific failures will depend heavily on the hook/operator/sensor used, as well as the use case. The sections above should help you through the most commonly encountered pitfalls that beginners face. For help with more complex issues, consider joining the Apache Airflow Slack or reach out to Astronomer.

Modern,Cloud-Native Data Orchestration powered by Apache Airflow

Start building your next-generation data platform with Astro.

Get Started