Schedule DAGs in Apache Airflow®
Schedule DAGs in Apache Airflow®
Schedule DAGs in Apache Airflow®
One of the fundamental features of Apache Airflow® is the ability to schedule jobs. Historically, Airflow users scheduled their DAGs by specifying a schedule with a cron expression, a timedelta object, or a preset Airflow schedule. Recent versions of Airflow have added new ways to schedule DAGs, including data-aware scheduling with datasets and the option to define complex custom schedules with timetables.
In this guide, you’ll learn Airflow scheduling concepts and the different ways you can schedule a DAG.
There are multiple resources for learning about this topic. See also:
To get the most out of this guide, you should have an existing knowledge of:
datetime package.To gain a better understanding of DAG scheduling, it’s important that you become familiar with the following terms and parameters:
The execution_date concept was deprecated in Airflow 2.2. If you’re using an older versions of Airflow and need more information about execution_date, see What does execution_date mean?.
The following parameters ensure your DAGs run at the correct time:
data_interval_start: Defines the start date and time of the data interval. A DAG’s timetable will return this parameter for each DAG run. This parameter is created automatically by Airflow, or is specified by the user when implementing a custom timetable.data_interval_end: Defines the end date and time of the data interval. A DAG’s timetable will return this parameter for each DAG run. This parameter is created automatically by Airflow, or is specified by the user when implementing a custom timetable.schedule: Defines when a DAG will be run. This value is set at the DAG configuration level. It accepts cron expressions, timedelta objects, timetables, and lists of datasets. The default schedule is timedelta(days=1), which runs the DAG once per day if no schedule is defined. If you trigger your DAG externally, set the schedule to None.start_date: The timestamp after which the first data interval for this DAG can start. Make sure this is set to be a date in the past, at least one full data interval before the first intended DAG run. For example if your DAG runs daily, the start_date should be at least one full day before the first intended run. See the example below for more information.end_date: The last date your DAG will be executed. This parameter is optional.In Airflow 2.3 and earlier, schedule_interval is used instead of the schedule parameter and it only accepts cron expressions or timedelta objects.
To demonstrate how these concepts work together, consider a DAG that is scheduled to run every 5 minutes. Looking at the most recent DAG run, the logical date is 2022-08-28 22:37:33, which is displayed in the Data interval start field in the following image. The logical date is also included in the Run ID field and identifies the DAG run in the Airflow metadata database. The value in the Data interval end field is 5 minutes later.

If you look at the next DAG run in the UI, the logical date is 2022-08-28 22:42:33, which is shown as the Next Run value in the Airflow UI. This is 5 minutes after the previous logical date, and the same value shown in the Data interval end field of the previous DAG run. If you hover over Next Run, you can see that the Run After value, which is the date and time that the next DAG run will actually start, matches the value in the Data interval end field:

The following is a comparison of the two successive DAG runs:
scheduled__2022-08-28T22:37:33.620191+00:00) has a logical date of 2022-08-28 22:37:33, a data interval start of 2022-08-28 22:37:33 and a data interval end of 2022-08-28 22:42:33. This DAG run will actually start at 2022-08-28 22:42:33.scheduled__2022-08-28T22:42:33.617231+00:00) has a logical date of 2022-08-28 22:42:33, a data interval start of 2022-08-28 22:42:33 and a data interval end of 2022-08-28 22:47:33. This DAG run will actually start at 2022-08-28 22:47:33.For pipelines with straightforward scheduling needs, you can define a schedule in your DAG using:
You can pass any cron expression as a string to the schedule parameter in your DAG. For example, if you want to schedule your DAG at 4:05 AM every day, you would use schedule='5 4 * * *'.
If you need help creating the correct cron expression, see crontab guru.
Airflow can utilize cron presets for common, basic schedules. For example, schedule='@hourly' will schedule the DAG to run at the beginning of every hour. For the full list of presets, see Cron Presets. If your DAG does not need to run on a schedule and will only be triggered manually or externally triggered by another process, you can set schedule=None.
If you want to schedule your DAG on a particular cadence (hourly, every 5 minutes, etc.) rather than at a specific time, you can pass a timedelta object imported from the datetime package to the schedule parameter. For example, schedule=timedelta(minutes=30) will run the DAG every thirty minutes, and schedule=timedelta(days=1) will run the DAG every day.
Note: Do not make your DAG’s schedule dynamic (e.g.
datetime.now())! This will cause an error in the Scheduler.
Airflow was originally developed for extract, transform, and load (ETL) with the expectation that data is constantly flowing in from some source and then will be summarized at a regular interval. However, if you want to summarize data from Monday, you need to wait until Tuesday at 12:01 AM. This shortcoming led to the introduction of timetables in Airflow 2.2.
Each DAG run has a logical_date that is separate from the time that the DAG run is expected to begin. A DAG run is not actually allowed to run until the logical_date for the following DAG run has passed. So, if you’re running a daily DAG, the Monday DAG run will not execute until Tuesday. In this example, the logical_date is Monday 12:01 AM, even though the DAG run will not actually begin until Tuesday 12:01 AM.
If you want to pass a timestamp to the DAG run that represents the earliest time at which this DAG run can started, use {{ next_ds }} from the jinja templating macros.
Astronomer recommends that you make each DAG run idempotent (able to be re-run without changing the result) which precludes using datetime.now().
The relationship between a DAG’s schedule and its logical_date leads to particularly unintuitive results when the spacing between DAG runs is irregular. The most common example of irregular spacing is when DAGs run only during business days from Monday to Friday. In this case, the DAG run with a Friday logical_date will not run until Monday, even though the data from Friday is available on Saturday. A DAG that summarizes results at the end of each business day can’t be set using only schedule. In Airflow 2.2 and earlier, you must schedule the DAG to run every day (including Saturday and Sunday) and include logic in the DAG to skip all tasks on the days the DAG doesn’t need to run.
The following are the limitations of a traditional schedule:
You can avoid these limitations by using timetables.
With Datasets, you can make Airflow aware of updates to data objects. Using that awareness, Airflow can schedule other DAGs when there are updates to these datasets. To create a dataset-based schedule, pass the names of the dataset(s) to the schedule parameter. Airflow 2.9 added the ability to define conditional logic for your dataset schedules, as well as the option to combine timetables with dataset-based schedules.
This DAG runs when both dataset1 and dataset2 is updated at least once.
This DAG runs when either dataset1 or dataset2 is updated.
This DAG runs every day at midnight UTC and, additionally, whenever either dataset1 or dataset2 is updated.
Datasets can be updated by any tasks in any DAG of the same Airflow environment, by calls to the dataset endpoint of the Airflow REST API, or manually in the Airflow UI.
In the Airflow UI, the DAG now has a schedule of Dataset. The Next Run column shows the datasets the DAG depends on and how many of them have been updated.

To learn more about datasets and data driven scheduling, see Datasets and Data-Aware Scheduling in Airflow guide.
Timetables, introduced in Airflow 2.2, address the limitations of cron expressions and timedelta objects by allowing users to define their own schedules in Python code. All DAG schedules are ultimately determined by their internal timetable and if a cron expression or timedelta object is not suitable, you can define your own.
Custom timetables can be registered as part of an Airflow plugin. They must be a subclass of Timetable, and they should contain the following methods, both of which return a DataInterval with a start and an end:
next_dagrun_info: Returns the data interval for the DAG’s regular scheduleinfer_manual_data_interval: Returns the data interval when the DAG is manually triggeredYou can run a DAG continuously with a pre-defined timetable. To use the ContinuousTimetable, set the schedule of your DAG to "@continuous" and set max_active_runs to 1.
This schedule will create one continuous DAG run, with a new run starting as soon as the previous run has completed, regardless of whether the previous run succeeded or failed. Using a ContinuousTimetable is especially useful when sensors or deferrable operators are used to wait for highly irregular events in external data tools.
Airflow is designed to handle orchestration of data pipelines in batches, and this feature is not intended for streaming or low-latency processes. If you need to run pipelines more frequently than every minute, consider using Airflow in combination with tools designed specifically for that purpose like Apache Kafka.
For this implementation, you’ll run your DAG at 6:00 and 16:30. Because this schedule has run times with differing hours and minutes, it can’t be represented by a single cron expression. So, you’ll implement this schedule with a custom timetable.
To start, you need to define the next_dagrun_info and infer_manual_data_interval methods. The time the DAG runs (run_after) should be the end of the data interval since the interval doesn’t have any gaps. To run a DAG that at 6:00 and 16:30, you have the following alternating intervals:
You define the next_dagrun_info method to provide Airflow with the logic to calculate the data interval for scheduled runs. The method also contains logic to handle the DAG’s start_date, end_date, and catchup parameters. To implement the logic in this method, you use the Pendulum package. The method is shown in the following example:
The code example completes the following process:
catchup=False. If so, the earliest date to consider should be the current date. Otherwise it is the DAG’s start date.Now, you define the data interval for manually triggered DAG runs by defining the infer_manual_data_interval method. The code appears similar to the following example:
This method determines what the most recent complete data interval is based on the current time. The following are the possible outcomes:
Three sets of logic are required to account for time periods in the same timeframe (6:00 to 16:30) on different days than the day that the DAG is triggered. When you define custom timetables, keep in mind what the last complete data interval should be based on when the DAG should run.
Now, you combine the two methods in a Timetable class which will make up your Airflow plugin. The following example is a full custom timetable plugin:
Because timetables are plugins, you’ll need to restart the Airflow Scheduler and Webserver after adding or updating them.
In the DAG, you can import the custom timetable plugin and use it to schedule the DAG by setting the schedule parameter (in pre-2.4 Airflow you will need to use the timetable parameter):
Looking at the Tree View in the UI, you can see that this DAG has run twice per day at 6:00 and 16:30 since the start date of 2021-10-09.

The next scheduled run is for the interval starting on 2021-10-12 at 16:30 and ending the following day at 6:00. This run will be triggered at the end of the data interval, so after 2021-10-13 6:00.

If you run the DAG manually after 16:30 but before midnight, you can see the data interval for the triggered run was between 6:00 and 16:30 that day as expected.

This timetable can be adjusted to suit other use cases. Timetables are customizable as long as the methods above are implemented.
When you implement your timetable logic, make sure that your next_dagrun_info method does not return a data_interval_start that is earlier than your DAG’s start_date. This will result in tasks not being executed for that DAG run.
There are some limitations to keep in mind when implementing custom timetables: