Be Our Guest
Interested in being a guest on The Data Flowcast? Fill out the form and we will be in touch.
APR 23 2026
In this episode, we explore the newly released Apache Airflow common AI provider — what problem it solves, how it was built and what's coming next.
Kaxil Naik, Senior Director of Engineering at Astronomer and Apache Airflow PMC member, and Pavan Kumar Gopidesu, Lead Data Engineer at Experian and Apache Airflow PMC member, join us to walk through the provider's first release and the technical decisions behind it.
Key Takeaways:
(00:00) Introduction.
(04:05) The common AI provider was born from a real production problem.
(07:10) Airflow already had the primitives needed for durable agent execution, making it the natural foundation for AI orchestration.
(09:15) The LLM schema compare operator uses Apache DataFusion to fetch source schemas.
(11:07) Apache DataFusion was chosen for its speed.
(13:09) Hook tool sets expose Airflow's provider hooks to agents with an allowed methods list that blocks destructive operations.
(15:20) Passing durable=True to an LLM operator caches tool calls and LLM outputs mid-task.
(18:13) The provider offers three abstraction levels.
(21:20) The provider currently requires Airflow 3 — the team is open to adding Airflow 2.11 support if demand is high enough.
(24:10) MCP server configs can be stored as Airflow connections.
Resources Mentioned:
Apache Airflow common AI provider docs
Introducing the Common AI Provider: LLM and AI Agent Support for Apache Airflow
Thanks for listening to "The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI." If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#Automation #Airflow #MachineLearning
Interested in being a guest on The Data Flowcast? Fill out the form and we will be in touch.
OR
By proceeding you agree to our Privacy Policy, our Website Terms and to receive emails from Astronomer.