If you follow Airflow development, you know that Airflow 2.3 brought with it support for dynamic task mapping, one of the most eagerly anticipated features in the eight-year history of Airflow. In virtually any other scenario, however, Airflow 2.3’s new grid view would have been the top-line feature. The old Airflow tree view, which it replaces, was a serviceable performer, but the grid view can do more things at once, can handle different, more complex arrangements, and looks better, too.
The grid view was developed by Airflow committers at Astronomer as a compact, intuitive way to visualize complex representations in Airflow’s UI — representations like dynamic tasks, tasks with multiple dependencies, and task groups — and to quickly surface useful metrics and information about what is happening inside of Airflow. It’s the centerpiece of our team’s ongoing effort to revamp the Airflow UI to surface the information users need to understand what’s going on with their DAGs and tasks in a single place, without switching contexts. We’re already delivering on this goal, and things are going to keep getting better with each new version of Airflow.
How Airflow 2.3’s new grid view can help the business
Let’s start with the business benefits and positive business outcomes of the new grid view:
Because the grid view makes it much easier to understand what is going on “inside” Airflow, support teams will find that they can now pinpoint problems and devise remediations more quickly and accurately.
The grid view radically simplifies troubleshooting when, for example, a customer-facing app or service fed by one or more complex data pipelines with branching dependencies fails. Or when failure occurs in the dynamic tasks that transform data for an ecommerce system’s next-best-option, customer-upsell, or customer-cross-sell dataflows. Rather than navigating the busy confusion of the tree-view’s UI elements, the grid view enables ops teams to go right to faulty tasks, drilling down to identify root causes. Ops personnel are better able to organize their DAGs now that task groups are better supported.
First-class support for task groups and dynamically mapped tasks
For visualizations such as task groups and dynamically mapped tasks, Airflow’s tree view presented challenges. Both involve visualizing tasks that branch out in different ways and ultimately converge on a single downstream task. The problem is that the branches of a tree can only branch, which means it isn’t possible to use a tree-like metaphor to display tasks that branch and then converge. Both types of tasks “broke” the Airflow UI in different ways.
Airflow task groups are supposed to give you a visual, conceptual means of organizing similar tasks, like parallel instances of the same ETL task. However, the only way to represent a task group in Airflow’s old tree view was by chaining task IDs together: e.g., task_group_1.task_1, task_group_1.task_2, etc. If a task group contained dozens or even hundreds of tasks (not uncommon in parallel ETL processing), it could “break” Airflow’s tree view — for example, by producing a long string of concatenated task IDs that runs off the page.
The tree view had trouble with dynamic tasks for the same reason: they involve visualizing multiple, parallel instances of the same task that converge into a single output. The only way to represent this in Airflow’s tree view was by creating a separate visual representation for each dynamically mapped task. So, if a dynamic task created 100 parallel instances of the same task, Airflow’s old tree view could only display… 100 separate tasks. It’s difficult to make sense of this, let alone use it as a basis for troubleshooting.
Airflow’s grid view solves these problems, giving users intuitive, at a glance information about the state of their DAGs and tasks.
A completely revamped Airflow UI, not just a UI refresh
The grid view is also important because it better positions Airflow to handle what’s coming — including features like dynamic task groups, which are slated for an upcoming release of Airflow beyond 2.4.
But it’s important for another reason, too: It’s the centerpiece of a strategy to redesign the Airflow UI so that it gives users all the information they need to understand what’s going on with their Airflow DAGs, tasks, task groups, etc., in one place. The new grid view is the starting point for an immersive experience in which — by clicking on individual tasks (i.e., grid squares) — users will be able to drill into and navigate through their DAGs and tasks without having to switch contexts.
The goal is for users to have one place to go to see a graphical representation of their DAG run, along with all of its dependencies and task-run histories. We’re fleshing out this capability steadily and iteratively. For Airflow 2.4 — due out later this month — we plan to give users the ability to click on a task to see the log file associated with it, instead of being redirected to a separate page. Also on deck is a visualization that lets users zoom in and out on their DAGs.
The grid view and most of the other UI enhancements are made possible by committers at Astronomer working behind the scenes to improve how the Airflow UI is designed, built, and maintained. We’re using new tools and methods to rebuild the UI so that it’s easier to iterate on, enabling us to introduce improvements much more quickly. For example, ops personnel will be able to quickly see and go to the root causes of problems once logs are integrated into the grid view in the upcoming Airflow 2.4 release. A task that used to require navigating through a series of page redirects to access Airflow’s log files will now become an integrated, organic workflow.
We understand that the word “improving” is subjective, so we want to hear from Airflow users to learn about how Airflow’s UI can better accommodate their preferred workflows.
Airflow has undergone massive changes in just the last 18 months. The Airflow 2.3 of today is radically different from the Airflow v1.10.14 you might recall from December, 2020. Significant innovation has gone into improving its ability to scale out — changes to its scheduler improved Airflow’s ability to process workloads in parallel, and enabled it to support a larger number of concurrent users and workloads — and to make better, more efficient use of available resources.
It’s these changes that have made the new UI enhancements necessary. If improvements are made around Airflow’s ability to host more concurrent users and their tasks, and if its ability to spin off multiple instances of the same task in parallel is fine tuned, then ops engineers and support teams need better tools to understand and manage these tasks. In order to make things as easy as possible for them, there must be a way to intelligently, intuitively visualize what’s happening. The new grid view is successful in achieving that goal.